{"id":417,"date":"2013-06-05T01:09:22","date_gmt":"2013-06-05T07:09:22","guid":{"rendered":"http:\/\/www.onecle.com\/blog\/?p=417"},"modified":"2013-06-05T01:09:22","modified_gmt":"2013-06-05T07:09:22","slug":"sec-edgar-robots-txt","status":"publish","type":"post","link":"https:\/\/www.onecle.com\/blog\/2013\/06\/05\/sec-edgar-robots-txt\/","title":{"rendered":"SEC EDGAR Robots.txt"},"content":{"rendered":"<p>For a long time, a lot of data in securities filings was hidden by obscurity. Sure, the SEC offered a <a href=\"http:\/\/searchwww.sec.gov\/EDGARFSClient\/jsp\/EDGAR_MainAccess.jsp\">full text search<\/a> of EDGAR filings, but it only spanned the last four years.  And, if you actually tried to use the full text search, it wasn&#8217;t too fruitful an experience.<\/p>\n<p>Here&#8217;s a search for <a href=\"http:\/\/searchwww.sec.gov\/EDGARFSClient\/jsp\/EDGAR_MainAccess.jsp?search_text=offer%20letter&#038;isAdv=false\">offer letter<\/a>:<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/www.onecle.com\/blog\/wp-content\/uploads\/2013\/06\/searchwww-sec-gov.jpg?resize=450%2C563\" alt=\"SEC EDGAR Full Text Search\" width=\"450\" height=\"563\" class=\"alignnone size-full wp-image-421\" \/><\/p>\n<p>Not particularly illuminating.<\/p>\n<p>However, something changed about a year ago.  On May 1, 2012, the <a href=\"http:\/\/web.archive.org\/web\/20120501050827\/http:\/\/sec.gov\/robots.txt\">robots.txt<\/a> file from the sec.gov website looked like this:<\/p>\n<p>User-agent: *<br \/>\nDisallow: \/Archives<br \/>\nDisallow: \/Archives\/bin<br \/>\nDisallow: \/Archives\/dev<br \/>\nDisallow: \/Archives\/etc<br \/>\nDisallow: \/Archives\/ftp<br \/>\nDisallow: \/Archives\/gopher<br \/>\nDisallow: \/Archives\/tmp<br \/>\nDisallow: \/Archives\/usr<br \/>\nDisallow: \/cgi-bin<br \/>\nDisallow: \/bin<br \/>\nDisallow: \/oursite\/previews<br \/>\nDisallow: \/edgar\/vprr<\/p>\n<p><em>Source: Internet Archive<\/em><\/p>\n<p>On the following day, the <a href=\"http:\/\/web.archive.org\/web\/20120502093134\/http:\/\/www.sec.gov\/robots.txt\">robots.txt<\/a> file looked like this:<\/p>\n<p>User-agent: *<br \/>\nAllow: \/Archives\/edgar\/data<br \/>\nAllow: \/Archives\/edgar\/vprr<br \/>\nDisallow: \/Archives\/bin<br \/>\nDisallow: \/Archives\/dev<br \/>\nDisallow: \/Archives\/etc<br \/>\nDisallow: \/Archives\/ftp<br \/>\nDisallow: \/Archives\/gopher<br \/>\nDisallow: \/Archives\/tmp<br \/>\nDisallow: \/Archives\/usr<br \/>\nDisallow: \/cgi-bin<br \/>\nDisallow: \/bin<br \/>\nDisallow: \/oursite\/previews<br \/>\nDisallow: \/Archives\/edgar\/vprr\/XXXX<br \/>\nDisallow: \/Archives\/edgar\/vprr\/vprr_removal<br \/>\nDisallow: \/Archives\/edgar\/vprr\/bin<\/p>\n<p><em>Source: Internet Archive<\/em><\/p>\n<p>The big change was this line that allowed search engines, such as Google, to index EDGAR data:<br \/>\nAllow: \/Archives\/edgar\/data<\/p>\n<p>Now you can use Google to search for an <a href=\"https:\/\/www.google.com\/search?q=site:www.sec.gov\/Archives\/edgar\/data+offer+letter\">offer letter<\/a> in the EDGAR archives.  The results are much better than the SEC&#8217;s full-text search.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>For a long time, a lot of data in securities filings was hidden by obscurity. Sure, the SEC offered a full text search of EDGAR filings, but it only spanned the last four years. And, if you actually tried to use the full text search, it wasn&#8217;t too fruitful an experience. Here&#8217;s a search for [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-417","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_featured_media_url":"","jetpack-related-posts":[],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.onecle.com\/blog\/wp-json\/wp\/v2\/posts\/417","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.onecle.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.onecle.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.onecle.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.onecle.com\/blog\/wp-json\/wp\/v2\/comments?post=417"}],"version-history":[{"count":0,"href":"https:\/\/www.onecle.com\/blog\/wp-json\/wp\/v2\/posts\/417\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.onecle.com\/blog\/wp-json\/wp\/v2\/media?parent=417"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.onecle.com\/blog\/wp-json\/wp\/v2\/categories?post=417"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.onecle.com\/blog\/wp-json\/wp\/v2\/tags?post=417"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}