[Triumf-linux-managers] Outage of the amanda backup system

Konstantin Olchanski olchansk at triumf.ca
Mon Apr 23 20:45:22 PDT 2007


The amanda backup system is presently down, the main 5 TB
raid array is unaccessible and no backups are happening.

If recovery of the main raid array proves unsuccessful,
the last tape backup has been done around 2 April 2007.

Details-

The amanda backup system had a bad week. Last Wednesday,
around April 18th, it's IP address was hijacked by evil
hackers (details from Andrew?). That happened around the time
of power outages in the ISAC2 computing room, confusing everybody.

These incidents induced a flurry of system administration
activity during which "something changed" and we started
seeing errors from the sata_mv driver of the raid controllers.

These errors caused raid array failures and the raid array
is presently wedged in an unaccessible state.

The changes leading to this situation could have been
the kernel upgrade (new sata_mv driver) or a bad
interaction with the new disk health SMART monitoring scripts.

I am presently following the procedures for recovering
the raid array... wish me luck.

-- 
Konstantin Olchanski
Data Acquisition Systems: The Bytes Must Flow!
Email: olchansk-at-triumf-dot-ca
Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada


More information about the Triumf-linux-managers mailing list