Preservation/Harddisks
Preserving harddisk contents
For two decades we have been preserving data media, including harddisks of all sorts, in Datamuseum.dk, and this page is an attempt to share our knowledge with the rest of the world, and we welcome contributions.
Connecting to old harddisks
Diablo Model 30
We have read about two dozen of these, but we cheated: We restored a computer like the one the disks were written on, wrote a program on it, which read the entire disk sector by sector, and transferred the data over a serial port.
For any harddisk older than approx 1980-ish, this is by far the easiest way, because back then harddisks could not be moved between computers of different kinds, every company made their own disk-controller and their own mistakes, and they often on purpose made sure to not be compatible with anybody else, so they could sell expensive "pre-formatted" harddisks.
The alternative to reading the disks on their "native" system is "flux reading", which brings us to:
SMD disks &c
Modern chips are so fast, that it is almost trivial to record the raw signals of old harddrives, which moves the problem from restoring an old computer to writing software which can decode what its disk-controller did.
Two of our volunteers when through that process with "SMD" disks, and they have documented both the process and the "gadget" they built to interface to the disk-drive:
The major challenges were:
A) Get the drive to work - same thing if you restore the original computer to working condition. B) Build circuitry to interface to the disk-drive. Almost always TTL-Open-Collector, so modern chips need level-conversion and buffers. C) Record the raw "flux" data D) Figure out how to decode the data from the "flux" files
None of that is particular easy - but it makes for a great hobby.
ST-506 "MFM" hard disks
These are the "cheap" harddisks from early home computers and PC compatible computers.
The on-disk format depends on the disk-controller, and have names such as "FM", "MFM", "RLL 2,7" and "Who knows what those people came up with?"
In a few cases we have read the disk in the computer it came in, but we almost always use David Gesswein's MFM Hard Disk Reader/Emulator
David's software has figured out practically all MFM-style disks we have tried, and it can record flux-files for anything else, and we have managed to decode RLL 2,7 formatted disks from the flux files. (Our python code for that is in contrib/ in David's software)
The most common problem is that the drive does not start spinning or the heads do not move, in both cases because the lubricant in the ball-bearings have become stuck.
If the drive does not spin up, we lift it a few inches above the table, power it up, and give it a good "twist" around the direction of the platters axis and that is almost always enough to get it spinning.
If the heads do not move, it is almost always necessary to void the warranty, unscrew the lid and poke around inside the drive while it tries to power up.
Dont try to hold your breath, but do not sneeze.
Try to spot if there is a brake mechanism, typically an electromagnet/solenoid on the PCB, which pulls a small arm, and see if it moves, if it does not, get that to move first.
If the brake does what it should, gently push on the arm as it tries to move, but be very clean and very cautious about it, if it suddenly overcomes the sticktion, it can whack your fingers real good.
Not that high performance drives, like the CDC drives which became Imprimis, Micropolis and eventually Seagate, have very beefy head-mechanisms, which
A) probably cut a finger in half
B) are not mechanically stable if the lid is not screwed on with all screws tight.
If you try to power these drives up without the lid, you are almost guaranteed to get a head-crash and not get any data from the drive.
ESDI harddisks
"Beware the ESDI of March" -- J.Cæsar (Quoted by M. Twain)
ESDI is a weird attempt to downscale the SMD interface to 5¼" disk-drives, and it was fortunately short-lived and only used by "serious" computer manufacturers for UNIX-level hardware.
So far we have only read a few of these disks, by moving them and their controller to a i486 PC running FreeBSD 3.5.1.
(Note to self: Extract more info from internal wiki)
IPI harddisks
Also a "we can do better than SMD" interface, but this time for companies who wanted to compete with IBM on buzzwords.
We have not read any of these yet, the plan, such as it is, is to try to do it on restored original computers.
SASI harddisks
SASI turned into SCSI, there are similarities, but not enough. Fortunately SASI was almost only in a configuration with a ST-506 disk connected to an Adaptec disk-controller, which then taked SASI to the computer. Reading the ST-506 disk as above has worked for us.
SCSI harddisks
Pretty straight forward. We use a PC with FreeBSD, which has really good and robust support for Adaptec SCSI controllers.
IDE and ATA harddisks
Almost no PC disk-controllers are "hot-plug" capable, and because sick hard disks essentially do that, the computer almost always go haywire with the drive.
Our solution is to use a USB-IDE/(S)ATA converter we bought cheap. In difference from IDE and (S)ATA interfaces, computers expect USB to do all sorts of crap, so this works really well.
On FreeBSD one can even set up a script which automatically resets the USB-gadget when things go haywire.
Optical Drives
So far we have not found any optical drives which still worked well enough.
Floppy disks
These will get their own page here Preservation/Floppydisks
Harddisks connected to a PC
We we can connect the harddisk to a PC, we do that, and then we use FreeBSD's recoverdisk program.
Recoverdisk is a quite simple program, which just keeps trying to read until everything has been read from the harddisk.
"Just keep trying" is a surprisingly effective strategy with harddisks.
As I write this, recoverdisk(1) is slowly but surely pulling the last 1.3MByte out of an uwilling SCSI disks, with the drive retrying and giving up, each attempt to read a sector takes about a second, 2% of the reads succeed, so it manages to pull out 30 kByte per hour. Some people have reported running recoverdisk(1) for months and years and finally getting the last few crucial sectors.
Recoverdisk keeps a "todo-list" in a separate file, so that it can continue from where it got to, if and when the hardware craps out along the way, or just because the process needs to be stopped and resumed later, for whatever reason.
Recoverdisk was born in FreeBSD and lives in the FreeBSD source tree, but it is deliberately written to make it easy to compile it on other UNIX-like systems.
Decoding flux files
This will get it's own page too Preservation/Flux-decoding
Things we know to work
Recovering data from defect or marginal harddisks is a not black magic, but some of it sure looks that way.
Here are some things we have noticed and experimented with which works:
Keep the drive in it's usual orientation
If you know how the drive was mounted originally, try to have it in the same orientation when you extract data, it seems to work better that way.
We do not know why this is, but we suspect it might have something to do with how the worn bearings wiggle as they move.
Keep the drive cool
Find a good big fan and mount it next to the drive. Cool electronics work better than hot electronics (see also: Thermal noise).
Avoid vibration
Try to avoid external vibration reaching the drive, we have seen this make a huge difference. Could be worn bearings sloshing around.
Try a clean power-supply
A couple of times we have seen drives which could almost not get anything of the platters, work almost perfectly when we powered them from a Lab power-supply.
One reason might be dried out electrolytic capacitors on the drive electronics not filtering well enough.
Things we are not quite sure if worked
In a couple of cases, changing the orientation of the drive to something different that it's usual orientation seemed to improved read-rate a lot.
One drive read pretty well when we started a session, but the read-rate dropped slowly an steadily. Some days later when we started a new session, the read-date started at the same "pretty well" rate and dropped steadily, and this pattern seemed to hold all the way till we had read everything.
In a couple of cases we have tried to swap the drives PCB with the PCB from an identical drive. It may have worked once or twice on old drives. More modern drives seems to have parameters calibrated to the specific mechanism in non-volatile memories, and with wrong parameters the drive never gets online.
Things that do not work
Cooling spray. We tried to lower the thermal noise on the read-amplifer, but it did not help, and worse: The flash-chip next to it forgot it' firmware.