September 2015

Disk error reporting

Some alerts were still going off due to the earlier disk swap-out on the main DB server. This is complicated by the pretty poor state of support for various suppliers’ cards in disk monitoring software. In this case, the fix turned out to be unwedging chef-client.

SMART monitoring on DB disks

The monitoring software on the DB machine reports an error when it can’t read the SMART values for spun-down hot spares. In this case, the fix was two-fold; one disk had failed and the other was simply spun down. The controller was configured to not spin down spares, which should fix the SMART issue. More on the failed disk next month…

Failed disk on Fume

There has been a failed disk on [Fume] which was causing a lot of alerts. The RAID array was rebuilt without it and this has stopped the error reports.

New aerial imagery machines

OWG ordered two new aerial imagery machines with the following spec:

This is a slight departure from the budget, which only called for one new machine. However, getting two machines allows us to have some degree of redundancy. The old HP kit is pretty reliable and has good out-of-band management, so we’re not anticipating any issues due to the age of the hardware.

Fix cleanup of empty tile directories

The tile cleanup script was fixed and now hits its cleanup target most of the time.