Outage 2nd February
Around 01:30 GMT on 2nd February the slave database server, ramoth, became unresponsive, leading to a partial outage of the web site and API that was resolved around 07:00 GMT when the server was power cycled.
Planet file missing
The 2015-02-09 planet did not generate as usual. The issue was investigated and determined to be due to a race condition. The code was fixed and a new version was released. The 2015-02-16 planet appears to have generated normally.
Reduction of power usage at UCL
Due to construction work at UCL, the servers are in a temporary home with reduced access to power and cooling. In order to reduce the power load and increase resilience to site failure, we moved a bunch of stuff around.
The taginfo server (grindtooth) was shut down and its functions moved to a server in the US (stormfly-01). (ticket)
There were two Nominatim servers (pummelzacken, poldi) at UCL, which isn’t a great situation for resilience to site failures. One of them (poldi) was moved to IC so that a service can be maintained even when one site is down. (ticket)
Upgrade munin server hardware
The munin server (urmel) was pulling about 300W on old hardware. The disks were swapped into a newer chassis and the power draw dropped by approximately half. (ticket)
OOB access to wiki server
Render machine SSD upgrade
The rendering machine at IC (orm) had been getting close to full on its 512GB SSD. This was replaced with a 1TB SSD. (ticket)
New certificates were purchased and deployed, but had to be rolled back due to incompatibilities with JOSM’s use of the Oracle JVM certicate list. (ticket)