[Triumf-linux-managers] Recovering a failing disk drive

Andrew Daviel advax at triumf.ca
Wed Nov 11 18:12:56 PST 2009


FYI

Not so relevant for a rackmount server with RAID, Nagios etc - more for
desktops on a budget, home PCs etc.


I recently had a new-ish (1yr old) 1Tb SATA drive go bad on me at home. I 
stupidly had not been checking the logs, and didn't realize what was going on 
as the driver would transparently retry things - it just got a bit slow 
sometimes, compared to what I recall of IDE giving I/O errors and failing fsck.

I was able to copy /home onto a new disk fairly easily, but the system was a 
bit more trouble. It seemed like it ought to be possible to recover it rather 
than starting off from a distro DVD again. To cut a long story short, I finally 
managed it. Or at least, good enough for most things.

I'm not sure how to avoid this mess in future. RAID ? Double the cost and 
can still be stolen/lost etc. Online backups - not bootable. Keeping 
track of what got installed is possible, assuming everything uses a 
package manager/downloader such as RPM/dpkg/yum. (except RPM doesn't 
remember which repository things came from). But for things built from 
source, where I lack the skill to roll an RPM from a complex makefile, 
it's harder.

Computing Services is aiming for a model where the configs are kept in an 
external repository and the system can be recovered from kickstart. This 
works pretty well for simple systems with only a few services, all from
a single distro. I'm not sure how to do it on a system like my desktop, 
with software from all over the place and time, built variously with RPM, 
"make install", and vendors' "run.sh".


The long story:
http://andrew.daviel.org/linux-sata-recovery.html

-- 
Andrew Daviel, TRIUMF, Canada


More information about the Triumf-linux-managers mailing list