My Musings on the 2038 Problem

I have long been partly aware of the 2038 problem, similar in nature to the Y2K problem – which, although at the ripe old age of 8 when it “happened”, I don’t really recall ever actually happening – but the key difference is that in 2038 there is the potential for some critical systems to face issues, in particular 32-bit ones that store dates e.g. every single 32-bit system.

So what exactly is the problem and what exactly is expected to happen?

In an n-bit system, dates are typically stored in a signed n-bit number, starting from the Unix Epoch, 1970-01-01 00:00:00 UTC, meaning that they can go up to 2n-1-1 seconds after that, which in 32-bit systems means 2038-01-19 03:14:07 UTC. We’re already long past the relevant points for 8-bit and 16-bit systems. In fact, I’m not even sure how 8-bit systems handled it given that 27-1 = 127 seconds which is a whole 2 minutes and 7 seconds. The same goes for 16-bit systems which would at least get you past 9AM on 1970-01-01, but not by much, 9:06:07 AM to be precise, so at least you’d get 6 working minutes out of it.

So given that times stored in 8- and 16-bit systems wouldn’t even get you a full day, getting 68 years out of a 32-bit system isn’t bad at all right? Well, no, it isn’t, but still, it’s not enough.

If I don’t do anything and still store my timestamps as signed 32-bit numbers, what will happen?

Here is where the distinction between signed and unsigned comes in, let’s use a 4-bit example just to keep things easy.

An unsigned 4-bit number can go all the way from 0000 = 0 to 1111 = 15 ( = 24 – 1), whereas a signed 4-bit number can go all the way from 1000 = -8 to 0111 = 7 ( = 23 – 1), this is because the first bit is used as the sign: 1 is negative and 0 is positive (or zero).

There is still a need, of course, to store dates before the Unix Epoch, this is typically handled by storing it as a negative number e.g. 1969-12-31 23:59:00 would be stored as -60. Bit-wise, this is done by changing the first bit to a 1 and using the remaining (n-1) bits to count up from “zero” as normal, up until 1 second before the epoch.

This is why dates are stored as signed numbers rather than unsigned, because dates existed before 1970. If we were to switch to unsigned integers then we would get an extra 68 years of breathing space, taking us to 2106-02-07 06:28:15 UTC. There would be two main problems with this however:

  1. The first one is hopefully obvious, we would completely lose the ability to work with any dates and times before 1970.
  2. Even that date is probably not as long as some people alive today will possibly live. Someone born on the day I am writing this (21st October 2022) will be 83 when that date comes, and I wouldn’t consider 83 to be an unreasonably long lifespan.

Now as to what will actually happen: time will keep moving and the binary stored number 01111111111111111111111111111111 will turn into 10000000000000000000000000000000 which will now be interpreted as – 231, so 231 seconds before the Unix epoch. This will take us all back to 1901-12-13 20:45:52 UTC. Naturally, this will cause chaos, especially if you believe time travel is possible and we’ve cracked it by then! I may go and buy a DeLorean just in case.

So, what is the solution?

Enter 64-bit

Most computers and processors you can buy today run on 64 bits, so it is incredibly unlikely that this whole problem will be a problem for end-user devices by 2038, and who is to say that by then that won’t become 128 bits, or even 256? The trouble comes when using architecture still running on 32 bits.

How long can you get out of a 64-bit system?

When doubling the number of bits, you essentially (almost) square the length of time (in seconds) that can be handled in an unsigned integer of n bits. The maths of squaring it doesn’t quite apply to signed integers. So how many years could we get out of 64 bits? A Century? A Millennium? A Decamillennium (that’s 10 millenia or 10,000 years)? Nope, you’d get 584.9 billion years. There are many comparisons you could make but the main one in my mind is that being roughly 42 times the age of the universe. This of course becomes 292.5 billion years either side of the Unix Epoch when you consider signed rather than unsigned numbers.

Call it morbid but the human race will be long gone by the time the year 292,500,001,970 comes around.

This said, however, with the advent of 64-bit numbers, could it be time to reconsider what we use as the Epoch? Is there really a need to arbitrarily start dates from 1st January 1970 any more? I’d like to propose 2 new possibilities for the Epoch:

Point in timeProsCons
0000-00-00 00:00:00 UTCWe can calculate exactly when it wasThings still happened before it, meaning numbers would have to be signed
It is based on religion which not everyone subscribes to, even if most countries recognise it as being year 0 and this current year being the 2022nd one after it.
The Big BangNothing happened before it (that we know of at least, I happen to believe otherwise)
There would be no need to use a signed integer
We cannot calculate exactly when it was, even the official age of the universe is listed with +- 0.02 billion years, which is 20 million!
Possible new Epochs with pros and cons for each

Installing PHP on MacOS Monterey

Here is the solution to a problem faced by any PHP developer that has upgraded their Mac to Monterey only to then re-attempt to run their PHP-based application afterwards and get back the dreaded response:

php: command not found

😨

That’s not good! Did Apple get rid of PHP in their latest operating system? Surely not!

Me – an hour after upgrading my MacBook

Oh, I see, that’s exactly what they did.

Me – 5 minutes later after Googling it

Right, so, how do we fix it?

First things first, make sure you have Homebrew installed. Chances are you already do, it being similar to apt for most Linux distros, but it’s best just to check.

brew -v

If you get back something that looks like the below, then you have it installed, otherwise you’ll need to go and install it.

Homebrew 3.5.9
Homebrew/homebrew-core (git revision 3eda28188e5; last commit 2022-08-22)
Homebrew/homebrew-cask (git revision eacfe8f6c1; last commit 2022-08-22)

Right, so now, just brew install php right? Unfortunately, this won’t work in quite the way you’re expecting, so it’s best now to install a formula that actually allows for more versions of PHP to become available, than were available on older versions of MacOS anyway:

brew tap shivammatur/php

Now you can install it, the only caveat is that you need to specifically choose the version (unless it’s 8.0).

brew install shivammatur/php/php@8.1

Other options available are:

  • 5.6
  • 7.0
  • 7.1
  • 7.2
  • 7.3
  • 7.4
  • 8.0 – as above, you can remove the @ part for this version

Now you need to link it to the php command:

brew link --overwrite --force php@8.1

Where possible of course, I’d usually recommend using the latest available version.

Now all these changes mean we need to restart our terminal before PHP will work, so once you’ve done that run php -v and you should get back something like the below:

PHP 8.1.9 (cli) (built: Aug  4 2022 15:11:08) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.1.9, Copyright (c) Zend Technologies
    with Zend OPcache v8.1.9, Copyright (c), by Zend Technologies

What if I want to change my PHP version?

You can change the PHP version that is installed and linked, but you will need to restart your terminal again after doing so. Let’s say I want to switch back to 7.4 for some reason:

brew install shivammatur/php/php@7.4

(You will only need to do the install step once.)

brew unlink php && brew link --overwrite --force php@7.4

Of couse, you need to unlink the command from its current destination before you can link it somewhere else!

But, wait, couldn’t I just use Docker?

Docker Moby Logo

In most cases, indeed you could! I’d actually recommend this method wherever possible, as you can spin up an image in moments with all the necessary dependencies, configurations, and of course PHP version for your project. However, my situation is rather unique (although maybe not as unique as I think) in that when accessing private Gitlab repositories for my company, even so far that I have to be connected to the company’s VPN to even access the login page, your SSH keys that are on your machine can’t then be used by a Docker container even if it is running on that same machine.

Docker also givs you the advantage of offering more versions. A lot more versions. A huge number of versions! A cursory glance at the tags available on the Docker Hub should give you an idea.

Conclusion

Is there a roadblock preventing you from using Docker, or a very good reason why you shouldn’t? Perhaps you haven’t learned Docker yet, luckily the learning curve is easy for beginners to get onto so I may well do an article on that as well (maybe even a video, I haven’t decided yet). If there is such a reason or roadblock, then go through the process I’ve described above. Otherwise, Dock away!

Oh, and, even if you do go down the Docker route, I’d still suggest at least completing the Homebrew step at the start of this post. A Mac without Homebrew is a bit like an internal combustion car without a gearbox, it will still work, but not nearly as well as it should!

Re-approaching the Project Euler Problems: Dealing with large files

Happy new year everyone! Welcome to 2022 and let’s hope it’s at least a bit better than the last couple of years have turned out to be.

Over the break I was tweaking with my Project Euler repo, and ran into a problem that part of me always suspected I might eventually hit at some point: one of my files (either a results CSV or an expected answers JSON) being too big and GitHub eventually saying “no, you can’t host that here”. I always saw this as an “eventually” issue though rather than a “during Christmas 2021” issue.

Whilst starting work on problem 2, I noticed that the numbers involved would be considerably larger, especially as the problem itself expects a default input of 4 million rather than the 10,000 in problem 1. So I got to work as I had done with problem 1, manually calculating the results against inputs of up to 40 and then checking my Python script against that, before deciding I could then trust it to generate answers all the way up to 4 million. All good although I must confess it took a while!

Now to do some other bits and pieces before retiring for the evening, and now to git push:

remote: Resolving deltas: 100% (24/24), completed with 7 local objects.
remote: error: Trace: aa212c3521a5fdbf4c114882235a794bf0c397722cee81565295fe45a1c5e3d3
remote: error: See http://git.io/iEPt8g for more information.
remote: error: File problem_2/problem_2_expected_answers.json is 222.32 MB; this exceeds GitHub's file size limit of 100.00 MB
remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com.
To https://github.com/gavinsykes/project-euler.git
! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'https://github.com/gavinsykes/project-euler.git'

Yikes.

There is quite an alarming amount of red there, by which I mean there is any red at all. And that isn’t me just highlighting bits red for emphasis, that is git itself printing red characters to the terminal.

Luckily, after having taken a look at the mentioned git-lfs it seems to be really quite simple to use, just tell it which files you expect to be larger than 100MB and it will sort them all out for you.

brew install git-lfs
git install lfs
git-lfs track "*.csv"
git-lfs track "*_expected_answers.json"

This should create a .gitattributes file with the following content:

*.csv filter=lfs diff=lfs merge=lfs -text
*_expected_answers.json filter=lfs diff=lfs merge=lfs -text

But there is still a problem, I had committed the large file (which I suspect was the expected_answers.json file for problem 2) somewhere within the last 13 commits, before having installed LFS. This means that even though installing LFS brought up the files I asked it to track so I could recommit them, I still had a commit that was trying to sync with the large file not tracked by LFS, meaning it still didn’t want to know.

So how do I manage this? I believe I have found the solution.

Run git status and it should tell you that Your branch is ahead of 'origin/master' by 13 commits. (Your number of commits may vary.)

Delete the suspected offending file(s) on your local machine and commit the deletion.

Reset back the relevant nuber of commits, this should now be 14 (in my case it was 15 because I decided to tweak some other scripts in the middle of doing this, but don’t do that, why would you do that? Why would you make it more complicated than it needs to be unless you’re an idiot like me?)

git reset --soft HEAD~15

If you’re in VSCode, you should see all the changes you made within the last x commits reappear in your staged changes, we can now “squash” them into a single commit, this 1 commit should now push to remote sin problema.

Now for the moment of truth, LFS is now all set up and appears to have been working on the current (not too big, yet) JSON and CSV files, let’s try it on the problem 2 expected answers JSON!

Uploading LFS objects: 100% (1/1), 190 MB | 1.3 MB/s, done.
Enumerating objects: 7, done.
Counting objects: 100% (7/7), done.
Delta compression using up to 12 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 463 bytes | 463.00 KiB/s, done.
Total 4 (delta 2), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
To https://github.com/gavinsykes/project-euler.git
0e78144..9ac5c2d master -> master

So, other than the remarkably low upload speed of 1.3MB/s (my router isn’t the greatest and I’m not exactly close to it), I think we can call that a success! 😁😁

Re-approaching the Project Euler Problems: Uh-oh

This all happened one quiet Friday evening.

I was at my laptop happily working away on my new system for tackling the Project Euler problems, and had just got both PHP and JavaScript playing nicely with the Python script (one of many) that I had created, which takes all the data of the language, version, machine specs (OS, memory, that sort of thing). Success! Naturally I had written scripts in each of the 3 languages to tackle – where else to start? Problem 1.

Problem 1 asks you to find the sum of all the multiples of 3 or 5 below 1000, of course for this project I am taking it a bit further and finding the sum of all said multiples below x, but it defaults to 1000 if no value of x is provided.

When I hit enter on the JavaScript file for the first successful time my heart leapt with joy as it reported no errors. Then just as quickly that joy turned to suspicion, “that answer looks different to the one Python has been giving me, oh please don’t be”.

Now it doesn’t take a particularly keen eye to spot that something has gone very wrong with at least 2 of these, they’re not even just a little bit out from each other, the JavaScript one is over 100,000 more than the next highest, PHP! The question now is, which one is right? Or even, are they all wrong?

This could well mean that all the data I have been collecting thus far is useless. I mean, what good is knowing how long it took a particular script to give you the wrong answer?

And here I was thinking I’d get away without automated testing! *Sigh* yet another bit to add! One of these days I might actually start solving the problems themselves!

The much older and wiser Gavin from 20 minutes in the future: Get away without automated testing? What on earth were you thinking, Gavin? You go ahead and do what you want but I’m going to bed 💤.

Re-approaching the Project Euler Problems – an Introduction

Avid readers of this blog (of which I’m sure there are many, I mean, there are 14 whole posts on here at the time of writing!) will remember that back in 2019 I started solving the Project Euler Problems, I set myself the task of solving 1 problem a week, writing a bit about it, and posting it every Sunday at 3PM without fail.

Needless to say, I failed. The few posts that did manage to see the light of day are now available under the project euler 2019 tag in all their glory. I started doing the first few in PHP then switched to Python, then wondered why I shouldn’t do them all in both of those languages, then thought about other languages such as JavaScript, C, Rust and C++, then tried to build all sorts of tools to help me manage all of this. Plus how would I store the code to display it? It took me quite a while to settle on Gists which with hindsight really should have been my first port of call. Can I really be sure I’ll be at my desk at 3PM on a Sunday every week for 749 weeks? No, of course I can’t, I’d better automate that part. 749 weeks is just under 14 and a half years. Meaning that if I had kept up with it from the latter half of 2019, it would run to about the end of 2033. My 28-year-old self would then be 42, my newborn infant would be about to start his GCSEs, you see where this is going, it’s a commitment. One way to combat this is to construct some tools around it to help automate it – as well as adding a few supplementary features – such as:

  • First of all, I want to make sure my code works, so automatic compilers for the languages that need compilation. These will also need to be tailored to the platform I happen to be running them on, especially given that back in 2019 I only had a clunky old laptop running Ubuntu. I now still have that same laptop, but also a much newer, smaller, Windows laptop, and now a MacBook from my new job. On the plus side this does now give me all 3 of Linux, MacOS and Windows!
    • I wonder if there’s a way to get these running on Android/iOS as well! Maybe save that for another day.
  • A tool to turn Markdown posts into nicely-formatted blog posts suitable for WordPress.
  • A way to automate their publishing for 3PM each Sunday – this is actually the easy part, just write them ahead of time and set their publish dates accordingly. Then again, who’s to say I won’t decide to alter this?
  • A tool to upload my code files into Gists (as well as maintaining the whole thing as a Git repo anyway).
    • This one is quite troublesome because at the time of writing my way of making sure a Gist for a given problem and language doesn’t already exist is a bit hit-and-miss.
  • A WordPress block to pull down all the Gists relating to the problem being viewed.
    • This has the same issue, however, as my Gist-uploading tool above. When I was going through the posts to add the project euler 2019 tag, problem 1 believed I’d only written it in C++ and Java!
  • A way to manage and maintain all of this, including making updates where necessary.
  • Additional feature: It would be nice to include a D3-based scatter plot of how long each language takes to solve each problem given different input values. This in itself is a huge data collection project, meaning I need to store the language, values, results, time taken and machine specs in a CSV for each problem. The latest version of the design for this can be seen on VizHub, and treat yourself to a screenshot below! This may give some insights beyond just “which languages are faster?” such as “does a particular language work better on a certain machine?”, “how do various language cope with higher inputs?” And so on.
Example screenshot of what each scatter plot will look like.
Example screenshot of what each scatter plot will look like, subject to change as it is still in development. Oh and that JavaScript outlier isn’t real data in case you were wondering!

So where to even begin with all this? Well, luckily, I’m writing this partway through getting my system built up, so I already have something of an idea of what to do. One thing I have noticed copied across multiple files (most of this automation is being done in Python) is a duplication of the dictionary containing the language names and file extensions. It would be good to get a comprehensive list of the ones I’ll be working on in a separate file wouldn’t it? Luckily I have found this Gist from Peter Pisarczyk (in turned forked from Aymen Mouelhi) which contains what appears to be just about every programming language ever. I haven’t gone through the whole list but I reckon it’s a fairly safe bet that I haven’t even heard of more than half of them!

This then results in the following languages.json file – I never knew there were this many file extensions in the known universe!:

At each step I’ll be looking to see what our basic folder structure looks like, so at the moment:

project-euler
├get_working_language_extensions.py
└working_language_extensions.json

Now as an example, let’s look at what I want each problem folder to look like:

project-euler
├problem_x
│├problem_x_timings.csv // To store the info of how long each problem took to solve, also store the input value, language, and machine specs
│├problem_x.c
│├problem_x.cpp
│├problem_x.java
│├problem_x.js
│├problem_x.md // To be converted into the relevant blog post
│├problem_x.php
│├problem_x.py
│├problem_x.rs
│└problem_x.ts
├get_working_language_extensions.py
└working_language_extensions.json

And for completeness and to allow me to make my next post in this series but still chronologically after this (albeit not by much), without leaving this one looking a bit short, here is the full file list (for now at least!) Along with a brief explanation, there will of course be full posts explaining these in due course!

project-euler
├problem_x
│├problem_x_timings.csv // To store the info of how long each problem took to solve, also store the input value, language, and machine specs
│├problem_x.c
│├problem_x.cpp
│├problem_x.java
│├problem_x.js
│├problem_x.md // To be converted into the relevant blog post
│├problem_x.php
│├problem_x.py
│├problem_x.rs
│└problem_x.ts
├.gitignore // (obviously!)
├c_functions.c // Functions to be imported into C scripts, not sure yet but this may need to be .h, once I dive properly into C I'll find out!
├cpp_functions.cpp // Functions to be imported into C# scripts
├env_info.json // gitignored as it's machine specific
├file_operations.py // All file-related operations
├file_templates.py // Templates for files in each language, hopefully to prepopulate imports, boilerplate, etc. in each folder
├get_env.py // Run this to creat env_info.json above
├github_config.py // Generated tokens for working with my Gists
├github_credentials.json // gitignored as these are the API keys for github_config to use! 
├java_functions.java
├js_functions.js
├languages.json // Renamed from working_language_extensions.json in the code block above
├languages.py // Renamed from get_working_language_extensions.py in the code block above
├mkdirs.py // Create all the directories for each problem, may merge this into file_operations.py
├php_functions.php
├problem_descriptions.json // I created this to allow the chart I'm designing to pull down a subtitle, may change how I do this
├publish_to_wordpress.py
├publish.py // A lot of duplicate code between these two, but the idea is that this automates creating the blog posts
├pyfuncs.py
├README.md
├rust_functions.rs
├ts_functions.ts
├updategists.py // Functions for working with Gists
├wp_config.py // Much like github_config.py but for WordPress
└wp_credentials.json // Again, like github_credentials, gitignored as these are keys