We go Wayback!

Mr. Peabody (https://en.wikipedia.org/wiki/Mister_Peabody) is a figure whose influence is truly incalculable. For any not familiar with this towering intellect, this bowtie-wearing beagle was a Nobel laureate, scientist, inventor, businessman, and Olympian. His exploits were described in “Peabody’s Improbable History”, which was a segment in “The Adventures of Rocky and Bullwinkle and Friends”, wherein he and his boy Sherman used Peabody’s “Wayback Machine” to visit different historical events.

The Wayback Machine was created by Peabody as a birthday gift for Sherman, and most of their visits involved fixing some problem which was keeping history from going as it should. As an example, when they visited Sir Isaac Newton, he was having a problem proving his theory of gravity due to the apples being full of a variety of fruit-fly which was preventing the apples from falling by causing them to fly instead. Peabody and Sherman finally helped Newton find an apple which fell correctly, thus saving history. While later discussing how difficult it sometimes was for famous people to become famous, Mr. Peabody commented on Sir Isaac’s brother Figby, and how much difficulty he encountered before inventing his famous cookie, the Fig Newton.

#TIL that the Wayback Machine is also sometimes described as the “WABAC” machine. The precise meaning of the acronym is apparently not known, but one of the show’s directors indicated that WABAC was a reference to the UNIVAC I (https://en.wikipedia.org/wiki/UNIVAC_I).

I recently received a note commenting that TIL Technology was not currently available on the Wayback Machine. The name of this tool (https://en.wikipedia.org/wiki/Wayback_Machine) was inspired by the original, and is one of the projects of the Internet Archive, whose mission (https://archive.org/about/) is to build a digital library of Internet sites and other cultural artifacts in digital form. I have previously mentioned it in passing, have used it occasionally, and have been curious about how new sites are added. It should be noted that the Wayback Machine is only one of a number of internet archives, but it’s arguably the most famous, along with being one of the oldest and largest.

One way to use the Wayback Machine is to query a site name, and view/compare snapshots over a period of time. In a prior post (https://www.til-technology.com/post/til-technology) I mentioned the infamous Time Cube site (https://en.wikipedia.org/wiki/Time_Cube). By checking the Wayback Machine, you can see the evolution of the site from 1998 through 2016, at which point it apparently went offline, to appear again in 2018 as what seems to be a betting site of some sort. Great for historical research!

The Wayback Machine is populated in a number of ways, mainly by way of “crawlers” which locate and download all publicly-accessible information and data on web pages and other sources. Some of these “crawls” are generated by the Internet Archive itself, while others are contributed by a number of other organizations. And, since 2005, Archive-It (https://archive-it.org/) was developed to allow institutions to build and preserve their collections and improve the overall consistency of data stored in the the Wayback Machine.

The presence of a given site in the Wayback Machine is dependent on whether it is “caught” by one of the crawlers, or whether a given page within a site is caught. In general, a site is more likely to be included when other sites link to it. And since there are quite a few web pages out there, it’s no wonder that there are gaps in coverage, particularly for smaller or newer sites.

The estimated number of pages archived by the Wayback Machine grew from 40 billion in 2005 to 514 billion in 2020. For context, there are 1.922 billion websites currently online, of which fewer than 200 million are currently active. Of course, each of these sites usually has many pages associated with it, such as my blog having one page per post.

It should be noted that the Wayback Machine focuses on publicly-available pages, which limits its content to the “surface web” or the open internet. I’ve commented before (https://www.internetlivestats.com/total-number-of-websites/) on how little of the actual internet is actually available publicly, but again, this is still quite a lot.

All of that said, I have been curious about whether TIL Technology might eventually be picked up by the Wayback Machine, and generally have a look when I have cause to visit the site. I have not taken any action to try and affect the site’s presence, so I wasn’t clear on whether something might be preventing access in some way.

Well, it appears that something changed. On 27-Dec-2021, one of my posts was apparently picked up by the Wayback Machine’s crawler, though I don’t know how. I would speculate that someone may have linked to one of my posts.

This probably shouldn’t excite me, but it does. I started this blog as an experiment, and have found it very useful in encouraging myself to dig a bit deeper into things I hear about, and continue to learn more about a variety of things. It can also be therapeutic, when I need to organize my thinking, distract myself, or vent a little.

And, hopefully, others might find some of the content interesting or thought-provoking.

I hope so.

Cheers!

“In science, 'fact' can only mean 'confirmed to such a degree that it would be perverse to withhold provisional assent.' I suppose that apples might start to rise tomorrow, but the possibility does not merit equal time in physics classrooms."

Stephen Jay Gould

Today I Learned

We go Wayback!

Recent Posts

Comments