Editing
Wayback Machine
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Technical information == The Wayback Machine's software has been developed to "[[Web crawler|crawl]]" the Web and download all publicly accessible information and data files on webpages, the [[Gopher (protocol)|Gopher]] hierarchy, the [[Usenet|Netnews]] (Usenet) bulletin board system, and software.<ref>{{cite web |last=Kahle |first=Brewster |title=Archiving the Internet |url=http://www.uibk.ac.at/voeb/texte/kahle.html |publisher=Scientific American – March 1997 Issue |access-date=August 19, 2011 |url-status=live |archive-url=https://web.archive.org/web/20120403042627/http://www.uibk.ac.at/voeb/texte/kahle.html |archive-date=April 3, 2012}}</ref> The information collected by these "crawlers" does not include all the information available on the Internet, since much of the data is restricted by the publisher or stored in databases that are not accessible. To overcome inconsistencies in partially cached websites, Archive-It.org was developed in 2005 by the Internet Archive as a means of allowing institutions and content creators to voluntarily harvest and preserve collections of digital content, and create digital archives.<ref>{{cite web |url=https://blog.archive.org/2014/10/27/archive-it-crawling-the-web-together/ |title=Archive-It: Crawling the Web Together |work=Internet Archive Blogs |first=Jeff |last=Kaplan |date=October 27, 2014 |access-date=October 16, 2017 |url-status=live |archive-url=https://web.archive.org/web/20171012212827/http://blog.archive.org/2014/10/27/archive-it-crawling-the-web-together/ |archive-date=October 12, 2017 }}</ref> Crawls are contributed from various sources, some imported from third parties and others generated internally by the Archive.<ref name="Leetaru"/> For example, crawls are contributed by the [[Alfred P. Sloan Foundation|Sloan Foundation]] and [[Alexa Internet|Alexa]], crawls run by Internet Archive on behalf of [[National Archives and Records Administration|NARA]] and the [[Internet Memory Foundation]], mirrors of [[Common Crawl]].<ref name="Leetaru"/> The "Worldwide Web Crawls" have been running since 2010 and capture the global Web.<ref name="Leetaru"/><ref name="Crawls">{{cite web |url=https://archive.org/details/widecrawl&tab=about |title=Worldwide Web Crawls |publisher=Internet Archive |access-date=October 16, 2017 |url-status=live |archive-url=https://web.archive.org/web/20171019222740/https://archive.org/details/widecrawl%26tab%3Dabout |archive-date=October 19, 2017 }}</ref> In September 2020, the Internet Archive announced a partnership with [[Cloudflare]] – an American [[content delivery network]] service provider – to automatically index websites served via its "Always Online" services.<ref name="archive-partners"/> Documents and resources are stored with time stamp URLs such as <code>{{#time:YmdHis}}</code>. Pages' individual resources such as images and style sheets and scripts, as well as outgoing [[hyperlinks]], are linked to with the time stamp of the currently viewed page, so they are redirected automatically to their individual captures that are the closest in time.<ref name="Using">{{cite web |url=https://help.archive.org/help/using-the-wayback-machine/ |title=Using The Wayback Machine |website=Internet Archive |access-date=September 25, 2024}}</ref> The frequency of snapshot captures varies per website.<ref name="Leetaru"/> Websites in the "Worldwide Web Crawls" are included in a "crawl list", with the site archived once per crawl.<ref name="Leetaru"/> A crawl can take months or even years to complete, depending on size.<ref name="Leetaru"/> For example, "Wide Crawl Number 13" started on January 9, 2015, and completed on July 11, 2016.<ref>{{cite web |url=https://archive.org/details/wide00013?&sort=-publicdate&page=3 |title=Wide Crawl Number 13 |publisher=Internet Archive |access-date=October 16, 2017 |url-status=live |archive-url=https://web.archive.org/web/20171019223332/https://archive.org/details/wide00013?&sort=-publicdate&page=3 |archive-date=October 19, 2017 }}</ref> However, there may be multiple crawls ongoing at any one time, and a site might be included in more than one crawl list, so how often a site is crawled varies widely.<ref name="Leetaru">{{cite web |url=https://www.forbes.com/sites/kalevleetaru/2016/01/18/the-internet-archive-turns-20-a-behind-the-scenes-look-at-archiving-the-web/#222f2e5682e0 |url-access=subscription |title=The Internet Archive Turns 20: A Behind the Scenes Look at Archiving the Web |work=Forbes |first=Kalev |last=Leetaru |date=January 28, 2016 |access-date=October 16, 2017 |url-status=live |archive-url=https://web.archive.org/web/20171016230439/https://www.forbes.com/sites/kalevleetaru/2016/01/18/the-internet-archive-turns-20-a-behind-the-scenes-look-at-archiving-the-web/#222f2e5682e0 |archive-date=October 16, 2017 }}{{cbignore}}</ref> A "Save Page Now" archiving feature was made available in October 2013,<ref name="savepage">{{cite web |author=Rossi, Alexis |date=October 25, 2013 |title=Fixing Broken Links on the Internet |url=https://blog.archive.org/2013/10/25/fixing-broken-links/ |url-status=live |archive-url=https://web.archive.org/web/20141107193437/http://blog.archive.org/2013/10/25/fixing-broken-links/ |archive-date=November 7, 2014 |access-date=December 29, 2013 |publisher=Internet Archive}}</ref> accessible on the lower right of the Wayback Machine's main page.<ref>{{cite web |title=Wayback Machine main page |url=https://archive.org/web/ |url-status=live |archive-url=https://web.archive.org/web/20140103004344/https://archive.org/web/ |archive-date=January 3, 2014 |access-date=December 30, 2013 |publisher=Internet Archive}}</ref> Once a target URL is entered and saved, the web page will become part of the Wayback Machine.<ref name="savepage" /> Through the Internet address web.archive.org,<ref>{{cite web |title=Web.archive.org directory |url=https://web.archive.org |url-status=live |archive-url=https://web.archive.org/web/20120103040016/https://web.archive.org/ |archive-date=January 3, 2012 |access-date=March 2, 2014}}</ref> users can upload to the Wayback Machine a large variety of contents, including [[PDF]] and [[data compression]] file formats. The Wayback Machine creates a permanent local URL of the upload content, that is accessible in the web, even if not listed while searching in the <nowiki>https://archive.org</nowiki> official website.{{jargon inline|date=October 2024}} Starting in October 2019, users were [[data cap|limited]] to 15 archival requests and retrievals per minute.<ref>{{cite web |url=https://archive.org/details/toomanyrequests_20191110 |title=Too Many Requests |publisher=Internet Archive |date=November 10, 2019 |access-date=November 27, 2021}}</ref> ===Storage capacity and growth=== As technology has developed over the years, the storage capacity of the Wayback Machine has grown. In 2003, after only two years of public access, the Wayback Machine was growing at a rate of 12 terabytes per month. The data is stored on [[PetaBox]] rack systems custom designed by Internet Archive staff. The first 100TB rack became fully operational in June 2004, although it soon became clear that they would need much more storage than that.<ref>{{cite web |url=https://archive.org/web/petabox.php |title= Petabox |website=Internet Archive |access-date=October 25, 2018}}</ref><ref>{{cite news |url=http://news.zdnet.com/2100-9584_22-5808754.html |title=Big storage on the cheap |last=Kanellos |first=Michael |date=July 29, 2005 |access-date=July 29, 2007 |archive-url=https://web.archive.org/web/20070403030705/http://news.zdnet.com/2100-9584_22-5808754.html <!-- Bot retrieved archive --> |archive-date=April 3, 2007 |publisher=CNET News}}</ref> The Internet Archive migrated its customized storage architecture to [[Sun Open Storage]] in 2009, and hosts a new data centre in a [[Sun Modular Datacenter]] on [[Sun Microsystems]]' California campus.<ref>{{cite web |title=Internet Archive and Sun Microsystems Create Living History of the Internet |publisher=[[Sun Microsystems]] |date=March 25, 2009 |url=http://www.sun.com/aboutsun/pr/2009-03/sunflash.20090325.1.xml |access-date=March 27, 2009 |url-status=dead |archive-url=https://web.archive.org/web/20090326200212/http://www.sun.com/aboutsun/pr/2009-03/sunflash.20090325.1.xml |archive-date=March 26, 2009}}</ref> {{As of|2009}}, the Wayback Machine contained approximately three [[petabyte]]s of data and was growing at a rate of 100 [[terabyte]]s each month.<ref>{{cite news |url=http://www.computerworld.com/action/article.do?command=viewArticleBasic&taxonomyName=hardware&articleId=9130081&taxonomyId=12&intsrc=kc_top |title=Internet Archive to unveil massive Wayback Machine data center |last=Mearian |first=Lucas |date=March 19, 2009 |access-date=March 22, 2009 |archive-url=https://web.archive.org/web/20090323093002/http://www.computerworld.com/action/article.do?command=viewArticleBasic&taxonomyName=hardware&articleId=9130081&taxonomyId=12&intsrc=kc_top |archive-date=March 23, 2009 |publisher=Computerworld}}</ref> A new, improved version of the Wayback Machine, with an updated interface and a fresher index of archived content, was made available for public testing in 2011, where captures appear in a calendar layout with circles whose width visualizes the number of crawls each day, but no marking of duplicates with asterisks or an advanced search page.<ref>{{cite web |title=Updated Wayback Machine in Beta Testing |url=http://iawebarchiving.wordpress.com/2011/01/24/updated-wayback-machine-in-beta-testing/ |author=gojomo |date=January 24, 2011 |access-date=August 19, 2011 |url-status=dead |archive-url=https://web.archive.org/web/20110823040310/http://iawebarchiving.wordpress.com/2011/01/24/updated-wayback-machine-in-beta-testing/ |archive-date=August 23, 2011 }}</ref><ref>{{Cite web |title=Advanced Search |url=https://web.archive.org/collections/web/advanced.html |website=Wayback Machine |access-date=April 3, 2022 |archive-url=https://web.archive.org/web/20100131104918/https://web.archive.org/collections/web/advanced.html |archive-date=January 31, 2010}}</ref> A top [[toolbar]] was added to facilitate navigating between captures. A bar chart visualizes the frequency of captures per month over the years.<ref>{{cite web |title=What's the difference between the classic Wayback Machine and the new Beta version? |url=http://faq.waybackmachine.org/whats-the-difference-between-the-classic-wayback-machine-and-the-new-beta-version |url-status=dead |archive-url=https://web.archive.org/web/20101225023945/http://faq.waybackmachine.org/whats-the-difference-between-the-classic-wayback-machine-and-the-new-beta-version |access-date=November 17, 2021 |archive-date=December 25, 2010 }}</ref> Features like "Changes", "Summary", and a graphical site map were added subsequently. In March that year, it was said on the Wayback Machine forum that "the Beta of the new Wayback Machine has a more complete and up-to-date index of all crawled materials into 2010, and will continue to be updated regularly. The index driving the classic Wayback Machine only has a little bit of material past 2008, and no further index updates are planned, as it will be phased out this year."<ref>{{cite web |url=https://www.archive.org/post/350738/updated-wayback-machine-in-beta-testing |title=Beta Wayback Machine, in forum |access-date=April 16, 2014 |url-status=live |archive-url=https://web.archive.org/web/20140417082107/https://archive.org/post/350738/updated-wayback-machine-in-beta-testing |archive-date=April 17, 2014}}</ref> Also in 2011, the Internet Archive installed their sixth pair of PetaBox racks which increased the Wayback Machine's storage capacity by 700 terabytes.<ref>{{cite web |url=https://archive.org/post/353721/6th-pair-of-racks-go-into-service-over-2pb-of-data-space-used |title=Internet Archive Forums: 6th pair of racks go into service: over 2PB of data space used |website=Internet Archive |access-date=October 25, 2018 |archive-url=https://web.archive.org/web/20161024144627/https://archive.org/post/353721/6th-pair-of-racks-go-into-service-over-2pb-of-data-space-used |archive-date=October 24, 2016 |url-status=live }}</ref> In January 2013, the company announced a milestone of 240 billion URLs.<ref>{{cite web |url=http://blog.archive.org/2013/01/09/updated-wayback/ |title=Wayback Machine: Now with 240,000,000,000 URLs | Internet Archive Blogs |date=January 9, 2013 |access-date=April 16, 2014 |url-status=live |archive-url=https://web.archive.org/web/20140414221120/http://blog.archive.org/2013/01/09/updated-wayback/ |archive-date=April 14, 2014}}</ref> In October 2013, the company introduced the "Save a Page" feature, which allows any Internet user to archive the contents of a URL, and quickly generates a [[permanent link]] unlike the preceding ''liveweb'' feature.<ref>{{cite web |url=https://blog.archive.org/2013/10/25/fixing-broken-links/ |title=Fixing Broken Links on the Internet |last=Rossi |first=Alexis |date=October 25, 2013 |website=Internet Archive |publisher=Collections Team, the Internet Archive |location=San Francisco, CA, US |archive-url=https://web.archive.org/web/20141107193437/http://blog.archive.org/2013/10/25/fixing-broken-links/ |archive-date=November 7, 2014 |url-status=live |access-date=March 25, 2015 |quote=We have added the ability to archive a page instantly and get back a permanent URL for that page in the Wayback Machine. This service allows anyone – wikipedia editors, scholars, legal professionals, students, or home cooks like me – to create a stable URL to cite, share or bookmark any information they want to still have access to in the future.}}</ref><ref>{{cite web |first=Alexander |last=Baron |title=((The new Internet Archive Wayback Machine now online)) |url=http://www.digitaljournal.com/article/360776 |website=Digital Journal |date=October 23, 2013 |access-date=November 19, 2020 |archive-date=November 19, 2020 |archive-url=https://web.archive.org/web/20201119071411/http://www.digitaljournal.com/article/360776}}</ref> In December 2014, the Wayback Machine contained 435 [[billion]] web pages—almost nine petabytes of data, and was growing at about 20 terabytes a week.<ref name="Arora">{{cite journal |last1=Arora |first1=Sanjay K. |last2=Li |first2=Yin |last3=Youtie |first3=Jan |last4=Shapira |first4=Philip |date=May 5, 2015 |title=Using the wayback machine to mine websites in the social sciences: A methodological resource |journal=Journal of the Association for Information Science and Technology |volume=67 |issue=8 |pages=1904–1915 |doi=10.1002/asi.23503 |issn=2330-1635|doi-access=free |hdl=10.1002/asi.23503 |hdl-access=free }}</ref><ref>{{cite web |title=Internet Archive Frequently Asked Questions |url=https://archive.org/about/faqs.php |access-date=January 17, 2015 |url-status=live |archive-url=https://web.archive.org/web/20091021003552/https://archive.org/about/faqs.php |archive-date=October 21, 2009}}</ref><ref>{{cite web |url=https://archive.org/about/faqs.php |archive-url=https://web.archive.org/web/20141218203115/https://archive.org/about/faqs.php |url-status=dead |archive-date=December 18, 2014 |title=Internet Archive Frequently Asked Questions |date=December 18, 2014 |access-date=December 13, 2018}}</ref> In July 2016, the Wayback Machine reportedly contained around 15 petabytes of data.<ref>{{cite web |title=Can the manipulation of big data change the way the world thinks? |work=The National |url=http://www.thenational.ae/opinion/comment/can-the-manipulation-of-big-data-change-the-way-the-world-thinks |access-date=May 14, 2017 |url-status=live |archive-url=https://web.archive.org/web/20170112060354/http://www.thenational.ae/opinion/comment/can-the-manipulation-of-big-data-change-the-way-the-world-thinks |archive-date=January 12, 2017}}</ref> In October 2016, it was announced that the way web pages are counted would be changed, resulting in the decrease of the archived pages counts shown. Embedded objects such as pictures, videos, style sheets, JavaScripts are no longer counted as a "web page", whereas HTML, PDF, and plain text documents remain counted.<ref name=":0">{{cite web |author=Goel, Vinay |date=October 23, 2016 |title=Defining Web pages, Web sites and Web captures |url=https://blog.archive.org/2016/10/23/defining-web-pages-web-sites-and-web-captures/ |url-status=live |archive-url=https://web.archive.org/web/20181209195730/https://blog.archive.org/2016/10/23/defining-web-pages-web-sites-and-web-captures/ |archive-date=December 9, 2018 |access-date=December 9, 2018 |publisher=Internet Archive}}</ref> In September 2018, the Wayback Machine contained over 25 petabytes of data.<ref>{{cite news |url=https://thehustle.co/inside-wayback-machine-internet-archive |title=Inside Wayback Machine, the internet's time capsule |last=Crockett |first=Zachary |date=September 28, 2018 |work=The Hustle |access-date=October 26, 2018 |archive-url=https://web.archive.org/web/20181002145800/https://thehustle.co/inside-wayback-machine-internet-archive |archive-date=October 2, 2018 |url-status=live }}</ref><ref>{{cite magazine |url=https://www.wired.com/story/wired25-virginia-heffernan-internet-archive-wayback-machine/ |url-access=limited |title=Things Break and Decay on the Internet—That's a Good Thing |last=Heffernan |first=Virginia |date=September 18, 2018 |magazine=WIRED |access-date=October 26, 2018 |archive-url=https://web.archive.org/web/20180925130510/https://www.wired.com/story/wired25-virginia-heffernan-internet-archive-wayback-machine/ |archive-date=September 25, 2018 |url-status=live }}</ref> As of December 2020, the Wayback Machine contained over 70 petabytes of data.<ref>{{cite web |url=https://blog.adafruit.com/2020/12/01/donate-to-the-internet-archive-digital-library-of-free-borrowable-books-movies-music-wayback-machine-internetarchive/ |title=Donate to the Internet Archive: Digital Library of Free & Borrowable Books, Movies, Music & Wayback Machine @internetarchive |date=December 1, 2020 |publisher=adafruit |access-date=December 2, 2020 |archive-date=December 2, 2020 |archive-url=https://web.archive.org/web/20201202065323/https://blog.adafruit.com/2020/12/01/donate-to-the-internet-archive-digital-library-of-free-borrowable-books-movies-music-wayback-machine-internetarchive/ |url-status=live }}</ref> {{Bar chart | title = Wayback Machine growth<ref>{{cite web |url=https://blog.archive.org/2014/05/09/wayback-machine-hits-400000000000 |title=Wayback Machine Hits 400,000,000,000! |author=michelle |publisher=Internet Archive |date=May 9, 2014 |archive-url=https://web.archive.org/web/20140826191225/http://blog.archive.org/2014/05/09/wayback-machine-hits-400000000000/ |archive-date=August 26, 2014 |url-status=live |access-date=March 25, 2015}}</ref><ref>{{cite web |url=https://www.archive.org/ |archive-url=https://web.archive.org/web/20201231000610/https://archive.org/ |title=Internet Archive |archive-date=December 31, 2020 |url-status=dead |publisher=Internet Archive |access-date=March 8, 2021}}<!-- Update me at end of 2021 --></ref> | label_type = Wayback Machine by year | data_type = Pages archived | bar_width = 45 | width_units = em | data_max = 900000000000 | label1 = 2004 | data1 = 30000000000 | label2 = 2005 | data2 = 40000000000 | label3 = 2008 | data3 = 85000000000 | label4 = 2012 | data4 = 150000000000 | label5 = 2013 | data5 = 373000000000 | label6 = 2014 | data6 = 400000000000 |label7=2015|data7=452000000000|label8=2016|data8=459000000000|label9=2017|data9=279000000000|data10=310000000000|label10=2018|data11=345000000000|label11=2019|data12=405000000000|label12=2020|label13=2021|data13=514000000000|label14=2022|data14=640000000000|color1=lightblue|color2=lightblue|color3=lightblue|color4=yellow|color5=yellow|color6=yellow|color7=orange|color8=orange|color9=yellow|color10=yellow|color11=yellow|color12=yellow|color13=orange|color14=red|comment1=0–100B: Light blue|comment4=100B–450B: Yellow|comment7=450B–600B: Orange|comment14=600B–: Red|data15=866000000000|label15=2024|color15=Red}} ===Wayback Machine APIs=== The Wayback Machine service offers three public APIs, SavePageNow, Availability, and CDX.<ref>{{Cite web|url=https://archive.org/help/wayback_api.php|title=Wayback Machine APIs |website=Internet Archive}}</ref> SavePageNow can be used to archive web pages. Availability API for checking the archive availability status for a web page,<ref>{{GitHub |akamhy/waybackpy}}</ref> checking whether an archive for the web page exists or not. CDX API is for complex querying, filtering, and analysis of captured data.<ref>{{cite web | url=https://blog.archive.org/developers/ | title=Developers | date=August 22, 2014 |website=Internet Archive Blogs |url-status=live |archive-url=https://web.archive.org/web/20240212160820/https://blog.archive.org/developers/ |archive-date= February 12, 2024 }}</ref><ref>{{cite web | url=http://blog.archive.org/2018/12/13/documentation-for-public-apis-at-the-internet-archive/ | title=Documentation for Public APIs at the Internet Archive | date=December 13, 2018 |website=Internet Archive Blogs |first1=John |last1=Gonzalez |url-status=live |archive-url= https://web.archive.org/web/20240113211453/https://blog.archive.org/2018/12/13/documentation-for-public-apis-at-the-internet-archive/ |archive-date= January 13, 2024 }}</ref> ===Website exclusion policy=== Historically, the Wayback Machine has respected the [[robots exclusion standard]] (robots.txt) in determining if a website would be crawled – or if already crawled, if its archives would be publicly viewable. Website owners had the option to opt out of Wayback Machine through the use of robots.txt. It applied robots.txt rules retroactively; if a site blocked the Internet Archive, any previously archived pages from the domain were immediately rendered unavailable as well. In addition, the Internet Archive stated that "Sometimes, a website owner will contact us directly and ask us to stop crawling or archiving a site. We comply with these requests."<ref>{{cite web|url=https://web.archive.org/collections/web/faqs.html#exclusions |title=FAQs – Some sites are not available because of Robots.txt or other exclusions. What does that mean? |website=Internet Archive Wayback Machine |archive-url=https://web.archive.org/web/20110415130934/https://web.archive.org/collections/web/faqs.html#exclusions |archive-date=April 15, 2011}}</ref> In addition, the website says: "The Internet Archive is not interested in preserving or offering access to Web sites or other internet documents of persons who do not want their materials in the collection."<ref>{{cite web|url=https://www.archive.org/about/faqs.php#2 |title= Frequently Asked Questions |website=Internet Archive |archive-url=https://web.archive.org/web/20140417122600/https://archive.org/about/faqs.php |archive-date=April 17, 2014|url-status=dead}}</ref><ref>{{cite news |url=https://motherboard.vice.com/en_us/article/nekzzq/wayback-machine-deleting-evidence-flexispy |website=Vice |title=The Wayback Machine Is Deleting Evidence of Malware Sold to Stalkers |last=Cox |first=Joseph |date=May 22, 2018 |access-date=May 23, 2018 |archive-url=https://archive.today/20180522192132/https://motherboard.vice.com/en_us/article/nekzzq/wayback-machine-deleting-evidence-flexispy |archive-date=May 22, 2018 |url-status=live}}{{cbignore}}</ref> On April 17, 2017, reports surfaced of sites that had gone defunct and became [[parked domain]]s that were using robots.txt to exclude themselves from search engines, resulting in them being inadvertently excluded from the Wayback Machine.<ref>{{cite web |title=Robots.txt meant for search engines don't work well for web archives |url=https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/ |website=Internet Archive |date=April 17, 2017 |access-date=June 29, 2019}}</ref> Following this, the Internet Archive changed the policy to require an explicit exclusion request to remove sites from the Wayback Machine.<ref name="Using" /> ====The Oakland Archive Policy==== Wayback's retroactive exclusion policy is based in part upon ''Recommendations for Managing Removal Requests and Preserving Archival Integrity'', known as ''The Oakland Archive Policy'', published by the School of Information Management and Systems at [[University of California, Berkeley]] in 2002, which gives a website owner the right to block access to the site's archives.<ref>{{cite web |title=Recommendations for Managing Removal Requests And Preserving Archival Integrity |date=December 14, 2002 |publisher=[[University of California]] |url=http://www2.sims.berkeley.edu/research/conferences/aps/removal-policy.html |access-date=October 20, 2024 |url-status=dead |archive-url=https://web.archive.org/web/20030502165937/http://sims.berkeley.edu/research/conferences/aps/removal-policy.html |archive-date=May 2, 2003}}</ref> Wayback has complied with this policy to help avoid expensive litigation.<ref>{{cite web |title=Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy |date=July 7, 2014 |publisher=Internet Archive |url=https://archive.org/post/1019415/retroactive-robotstxt-removal-of-past-crawls-aka-oakland-archive-policy |access-date=September 14, 2017 |url-status=live |archive-url=https://web.archive.org/web/20171010124036/https://archive.org/post/1019415/retroactive-robotstxt-removal-of-past-crawls-aka-oakland-archive-policy |archive-date=October 10, 2017 }}</ref> The Wayback retroactive exclusion policy began to relax in 2017, when it stopped honoring robots on U.S. government and military web sites for both crawling and displaying web pages. As of April 2017, Wayback is ignoring robots.txt more broadly, not just for U.S. government websites.<ref>{{cite web |url=http://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/ |title=Robots.txt meant for search engines don't work well for web archives |work=Internet Archive Blogs |first=Mark |last=Graham |date=April 17, 2017 |access-date=April 16, 2017 |url-status=live |archive-url=https://web.archive.org/web/20170417131508/http://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/ |archive-date=April 17, 2017}}</ref><ref>{{cite web |title=Archivierung des Internets: Internet Archive ignoriert künftig robots.txt |date=April 25, 2017 |url=https://www.heise.de/newsticker/meldung/Archivierung-des-Internets-Internet-Archive-ignoriert-kuenftig-robots-txt-3693558.html |publisher=heise online |access-date=May 14, 2017 |language=de |url-status=live |archive-url=https://web.archive.org/web/20170427035659/https://www.heise.de/newsticker/meldung/Archivierung-des-Internets-Internet-Archive-ignoriert-kuenftig-robots-txt-3693558.html |archive-date=April 27, 2017}}</ref><ref>{{cite web |title=Suchmaschinen: Internet Archive will künftig Robots.txt-Einträge ignorieren – Golem.de |url=https://www.golem.de/news/suchmaschinen-internet-archive-will-kuenftig-robots-txt-eintraege-ignorieren-1704-127446.html |access-date=May 14, 2017 |language=de |url-status=live |archive-url=https://web.archive.org/web/20170619210648/https://www.golem.de/news/suchmaschinen-internet-archive-will-kuenftig-robots-txt-eintraege-ignorieren-1704-127446.html |archive-date=June 19, 2017}}</ref><ref>{{cite news |title=Internet Archive will ignore robots.txt files to keep historical record accurate |url=https://www.digitaltrends.com/computing/internet-archive-robots-txt/ |newspaper=Digital Trends |access-date=May 14, 2017 |date=April 24, 2017 |url-status=live |archive-url=https://web.archive.org/web/20170516130029/https://www.digitaltrends.com/computing/internet-archive-robots-txt/ |archive-date=May 16, 2017}}</ref>
Summary:
By saving changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
Edit source
View history
More
Search
Navigation
Main page
Community portal
Current events
Recent changes
Random page
Help
Donate
Tools
What links here
Related changes
Upload file
Special pages
Page information