Let’s talk about the end of the world, shall we?
Or maybe not. Fortunately, some stories of impending doom are overestimated, and some can be mitigated/avoided. Y2K was, arguably, a bit of both.
To summarize, the “Y2K problem” referred to errors related to date storage in computer systems. There was no single issue, but rather a broad array of issues of varying levels of complexity. The common element was usually the fact that many computer systems stored dates with a two-digit year, and either assumed that the first two digits would always be “19”, or that they could be interpreted by making assumptions tied to the current system date. For example, based on the current year 2024, assume that a birth year of “01” refers to “2001”, while “75” refers to “1975”.
This may seem stupid and short-sighted, but the truth is more nuanced. Two of the main issues which led to Y2K were the extreme expense of computer memory in the early days, and the fact that few early programmers even considered the possibility that their programs might be running more than a few years.
The historical cost of memory (and storage) is astonishing. The numbers have changed so dramatically that the scale is logarithmic, and the magnitudes make things sound silly on one end of the period or the other. We end up either talking about cost per terabyte, which takes us from $3.7 quadrillion / TB in 1957 (a truly silly number) to $1,088 / TB in 2023, or we talk about $3,787 / KB in 1957 (reasonable enough) to a ridiculous $0.000001 / KB in 2023. (I remember being excited at finding RAM for under $50 / MB in the early 1990s.)
Due to the high cost of memory and storage, it was always a top priority for programmers to minimize memory and storage in any way possible. In the 1960s, for the overwhelming majority of cases, the “19” could be assumed, and using a two-digit year seemed quite reasonable for most situations.
Wait! No! It’s not reasonable to assume a two-digit year, is it? We’ll eventually get to 2000 and beyond, won’t we?
Yes. These programmers obviously knew that this design decision was a tradeoff and that there could eventually be an issue, but that’s where we get to the other part. I think the programmers of the 1960s can be forgiven for saving a lot of money in memory/storage, and for not predicting that their programs would still be used decades later (maybe a topic for a future post). But what about later programmers?
Some continued to use the two-digit year out of inertia (or an unfounded belief that “fixing” the problem would be easy when the time came), but many did so for compatibility with older systems, or lack of funds to overhaul systems – at least until the millennium loomed and people began to worry. This is when most organizations started to pour resources into addressing Y2K, ultimately leading to some people asserting that the whole thing was a scam and a waste of money.
If you work very hard to avoid or mitigate some issue, how do you feel when someone comes along later and says it was all a waste because nothing “serious” happened? That’s the POINT! Security practitioners in particular will feel this – when they are successful, nothing happens, but that’s when someone will start to question funding for security, say it’s not necessary, and “prove” their point by saying that nothing happened.
(I’ll pause for a moment, while readers bang their heads against the wall in frustration about their lived experience.)
In many cases, systems would simply crash, creating issues which would have been catastrophic, but which were also comparatively simple to find and fix. Sometimes, however, systems would report invalid results which would be passed to other systems, leading to subtle errors which could be very difficult to identify and correct, and which might not be identified initially as being a “Y2K issue”.
Easy enough to see a Y2K issue in a child showing an age of 102, or a very old person showing as 2 years old, but what if you had a database which stored dates with four digits that was showing strange results – and then (eventually) found a script running two or three layers down, with “19” hardcoded in it?
Arguably not Y2K, but I once built a report to get information on people who had graduated more than fifty years before, and was confused to see that it didn’t match what was visible on the data maintenance form. The form showed something like 01/01/55, but the person was not included on the report, even though others with similar dates were. After digging a bit, I discovered that there were a handful of people who had apparently graduated in 955... Oh. Must be a Y2K issue in the form, right? No – the form could not have created a record like that. The eventual conclusion was that someone had added a set of records (either manually or via script/loader) with the year 955 rather than 1955.
Many years later, around 2010, I inherited a legacy application which was retained (mostly offline) for historical audits. The original developers and operators were all gone, and the system had been running on a ring-fenced network for years. The technology team kept trying to decommission the system, but the business team was reluctant to do so, because they “might need it” at some point. Then, some piece of hardware on the server failed and, after replacing the faulty part, the application failed on reboot. The hardware team couldn’t see any issues, and the error message told us nothing (no legacy programmers, no documentation... you get the idea). After some time trying to walk through ancient code running on a long-obsolete interpreter on a legacy operating system, we remembered a comment someone made during the handoff, about not being included in the years-ago Y2K project....
Nah – couldn’t be. Could it? After arguing a bit, and trying to come up with better options, we tried resetting the server clock to 1980, and everything came back, running perfectly. So far as we could determine, there was some sort of Y2K issue buried in the (obscure) Basic interpreter on which the application was built. Sigh.
But who cares? This is all ancient history, right?
Actually, no. While the odd Y2K issue does pop up from time to time, there is another time bomb ticking away in our computer systems.
On Unix and Linux systems, dates are stored as the number of seconds since 1-Jan-1970, and were traditionally stored as 32-bit signed integers, resulting in a range from -2147483648 to 2147483647, or 20:45:52 UTC on 13 December 1901 to 03:14:07 UTC on 19 January 2038. This means that, unless addressed, all unpatched Unix and Linux system dates will “flip” back to 1901 in about 14 years (as of the writing of this post), and any date references after 19 January 2038 will cause problems.
The image at the top shows a countdown timer, in hexadecimal, with one hand for each set of 8 bits in the 32-bit signed integer. The world ends when the red hand points directly down.
Well, maybe not. This time, we have the lessons of Y2K, vastly more mature system processes, and the luxury of another decade or so before we would start to see issues.
Also, fortunately, a great many Unix and Linux systems have already been updated, so the main problem will be computers which aren’t connected to the internet, and cannot be updated via a software patch. These represent a risk, but people are aware of it and working on it, and the majority of these systems are likely to be replaced long before 2038. Am I a pessimist for thinking that, even with so much foresight and time, we’ll still see some problems in 2038? Or an optimist for thinking they won’t have any significant impact on the world?
The most common fix for the issue is to use a 64-bit integer, rather than 32-bit. This should help, but it would still mean that we’d need to apply another update, in about 292 billion years. That should be ok, right? At least for a while?
Cheers!
Comments