[Triumf-linux-managers] Recovering a failing disk drive
Andrew Daviel
advax at triumf.ca
Wed Nov 11 18:12:56 PST 2009
FYI
Not so relevant for a rackmount server with RAID, Nagios etc - more for
desktops on a budget, home PCs etc.
I recently had a new-ish (1yr old) 1Tb SATA drive go bad on me at home. I
stupidly had not been checking the logs, and didn't realize what was going on
as the driver would transparently retry things - it just got a bit slow
sometimes, compared to what I recall of IDE giving I/O errors and failing fsck.
I was able to copy /home onto a new disk fairly easily, but the system was a
bit more trouble. It seemed like it ought to be possible to recover it rather
than starting off from a distro DVD again. To cut a long story short, I finally
managed it. Or at least, good enough for most things.
I'm not sure how to avoid this mess in future. RAID ? Double the cost and
can still be stolen/lost etc. Online backups - not bootable. Keeping
track of what got installed is possible, assuming everything uses a
package manager/downloader such as RPM/dpkg/yum. (except RPM doesn't
remember which repository things came from). But for things built from
source, where I lack the skill to roll an RPM from a complex makefile,
it's harder.
Computing Services is aiming for a model where the configs are kept in an
external repository and the system can be recovered from kickstart. This
works pretty well for simple systems with only a few services, all from
a single distro. I'm not sure how to do it on a system like my desktop,
with software from all over the place and time, built variously with RPM,
"make install", and vendors' "run.sh".
The long story:
http://andrew.daviel.org/linux-sata-recovery.html
--
Andrew Daviel, TRIUMF, Canada
More information about the Triumf-linux-managers
mailing list