[Triumf-linux-managers] amanda failure
Kelvin Raywood
kray at triumf.ca
Thu Oct 28 15:51:46 PDT 2010
There has been a failure in the storage array of Amanda; the system used
to backup desktop Linux systems at TRIUMF. Backups have not occurred
since the weekend and backups done before the weekend are not presently
available for restores.
We are working on recovering the array and estimate that backups from
before last weekend will be available by mid next-week. In the
interim, the T2K group have kindly provided us temporary access to one
of their storage-servers so that we can run fresh backups starting this
evening. But it will be starting from scratch; doing full backups of
all clients so may take up to a week to complete.
If you have files on a non-RAID disk that you were relying on amanda to
protect, we suggest that you consider copying them to another temporary
location to protect yourself from catastrophic failure during the period
while amanda caches up. Some possibilities might be using rsync to copy
files to a WestGrid account, or to /NOT_BACKED_UP/home/<username> on
trcomp01 or trcomp02. We request that you do not use trshare for this
purpose as there's not much space on that system.
Our core Linux-servers on which you may have files, are backed up using
an alternate system and not affected by the amanda failure. These
include trshare, ibm00, trcomp01, trcomp02, trmail, trserv, elog,
indico, and various web-servers including, content-management systems
and the docushare server.
We will keep you informed of the status via this mailing list. The
status of backups that is usually available at
http://amanda.triumf.ca/~amanda will be unavailable or out of date until
the full cycle completes. That page is updated at the end of each
backup cycle.
Apologies to all for the delay in this announcement while we evaluated
the extent and impact of the failure, and determined a planned for
recovery. We'd also like to thanks Konstantin Olchanski of the DAQ
group for taking the lead in the recovery effort.
--
Kel Raywood
Core Computing and Networking
More information about the Triumf-linux-managers
mailing list