February 2015

Outage 2nd February

Around 01:30 GMT on 2nd February the slave database server, ramoth, became unresponsive, leading to a partial outage of the web site and API that was resolved around 07:00 GMT when the server was power cycled.

Planet file missing

The 2015-02-09 planet did not generate as usual. The issue was investigated and determined to be due to a race condition. The code was fixed and a new version was released. The 2015-02-16 planet appears to have generated normally.

Reduction of power usage at UCL

Due to construction work at UCL, the servers are in a temporary home with reduced access to power and cooling. In order to reduce the power load and increase resilience to site failure, we moved a bunch of stuff around.

Taginfo server

The taginfo server (grindtooth) was shut down and its functions moved to a server in the US (stormfly-01). (ticket)

Nominatim server

There were two Nominatim servers (pummelzacken, poldi) at UCL, which isn’t a great situation for resilience to site failures. One of them (poldi) was moved to IC so that a service can be maintained even when one site is down. (ticket)

Upgrade munin server hardware

The munin server (urmel) was pulling about 300W on old hardware. The disks were swapped into a newer chassis and the power draw dropped by approximately half. (ticket)

OOB access to wiki server

The wiki server (ouroboros) had been upgraded back in October 2014, which meant the OOB port’s MAC address had changed and was inaccessible. The new MAC address on record has been updated. (ticket)

Render machine SSD upgrade

The rendering machine at IC (orm) had been getting close to full on its 512GB SSD. This was replaced with a 1TB SSD. (ticket)

SSL certficates

New certificates were purchased and deployed, but had to be rolled back due to incompatibilities with JOSM’s use of the Oracle JVM certicate list. (ticket)