A bit over a month ago, the Slicehost VPS on which I ran this site (along with all my other services) decided to stop responding to requests. Pings were still going through but connections would just time out. Going into the web management interface, I discovered that something was causing the CPU to sit on 100% – not a particularly useful place for it to be. I tried using the AJAX web console to log in but that was also unresponsive, so a quick hard-restart and the server was back up and running.
A few days later, it happened again. By now I was starting to suspect that it was something in the SNAILBot scripts that was hanging (as the number of logged lines has increased, the page generation scripts have slowed to a crawl, with only the cache making things still viewable). So I hard-restarted the server again, and… nothing. Now nothing would even connect, which was rather a backwards step. Luckily the AJAX web console was working, so I went in and poked around, to discover that somehow, all the network configurations and scripts had disappeared. No
/etc/resolv.conf and so on. In addition, nearly all of the services were dead, though the startup scripts for those were still present.
So, I decided to take this unexpected downtime as an opportunity to migrate over to the new Linode server I now have, which I got for two reasons – I could get a more powerful VPS for less cost, and Rackspace is in the process of absorbing Slicehost into its cloud services, which cost-wise work out much more expensive for me. This migration was slowed a bit however by the fact that I became extremely busy for just over two weeks following the initial failure, mainly with me packing up from my three-month stay in Belgium.
And that, dear reader, is my excuse for my website, and all of my services (SNAILBot, my personal Git repositories etc.) being offline for the best part of a month. I still have some final things to set up and packages to install, but during the migration I moved all the configuration files into a Git repository to prevent any sort of loss happening again. I should also probably use Puppet or something for the actual setup of the server, but I decided to go with good old Bash scripts instead (mainly for the purpose of recording exactly what I go through to set it up). And hopefully this is it for the downtime, though properly mitigating it likely means me finding time to fix the SNAILBot page generation scripts…