New release. More details.
The kernel logs on ramoth, the replica database server, were filling up the disk with spurious warning messages, and would have caused problems if they had been allowed to fill the disk. It was decided to reboot it at around 12:00 UTC. Although load was switched to the master during the reboot, when load was switched back to ramoth it was unable to handle the number of requests because none of the data was “warm” in memory, which meant it all had to be fetched from disk.
This caused intermittent outages and errors until almost all of the load was switched back to the master database at 13:15 UTC.
Attempts to rebalance load at around 20:00 UTC resulted in further intermittent outages until 19:40 UTC, and the load was only finally rebalanced at 01:20 UTC.
The new database server, karm, should alleviate these problems, as it will be both more stable, and quicker to “warm up” after a reboot.
Work has started on infrastructure to test the OSM Chef cookbooks. This will be a lot of work, so any help would be greatly appreciated! The tests will allow anyone using or contributing to the cookbooks to have greater confidence that they are working correctly and hopefully will result in fewer mistakes and more external contributions. More details.