Sunday, December 06, 2020

A lucky hdd recovery adventure

In all my years of home computing, I never had a catastrophic hard drive failure. Saturday's excitement felt like one, with I/O errors on the journal.

The usual graphical display on my computer was replaced by large white typewriter-style characters on a dark background, dire warnings about I/O errors, nonexistent device on remount, and other words I don't recall. I found a USB drive that still had knoppix 8.6, and booted. Diagnostics (CPU, memory) found no problems so I tried "e2fsck -y" on the failed partitions (both / and /home, which made me think "electronics" rather than "bad spot on the disk"). That ended with "Filesystem still contains errors!"

I tried to remove the journal with tune2fs, which insisted on first trying to replay the journal, that's right, the very journal I wanted to ignore/remove. After a few hours of futility, I had an idea. Since there were I/O errors on the disk drive, but all were happening at rather high block numbers (with at least 9 digits if I recall correctly), I thought, what if I just did a block-for-block copy onto a new drive?

Off to the store for a 1TB internal SSD with SATA connectors: about $100—a dime per gigabyte! This is an amazing time we live in. "fdisk -l" gave me the bad drive's layout; I partitioned the new drive with the same-size partitions... I think the new drive is slightly larger, but who cares? I copied the filesystem partitions over, and was pleasantly surprised to note that the filesystem uuids matched. (D'oh!) The root partition (just 60GB) was done in less than 10 minutes:

Linuxknoppix@Microknoppix:~$ dd if=/dev/sda1 of=/dev/sdb1 bs=1M
61440+0 records in
61440+0 records out
64424509440 bytes (64 GB, 60 GiB) copied, 408.774 s, 158 MB/s
knoppix@Microknoppix:~$ 
Muttering "Let fortune favor the foolish," I typed:
root@Microknoppix:/home/knoppix# e2fsck -y /dev/sdb1
e2fsck 1.44.5 (15-Dec-2018)
/dev/sdb1: clean, 309900/3909120 files, 5462752/15728640 blocks
root@Microknoppix:/home/knoppix# mount /dev/sdb1 /mnt
root@Microknoppix:/home/knoppix# ls /mnt
bin   etc    home            lib         media  proc  sbin  tmp  vmlinuz
boot  extra  initrd.img      lib64       mnt    root  srv   usr  vmlinuz.old
dev   foo    initrd.img.old  lost+found  opt    run   sys   var
root@Microknoppix:/home/knoppix# 
That may have been a little foolhardy, but I went to bed hoping for similar grace to befall on /home (over 800GB). The next morning, the copy was done. The news was happy:
root@Microknoppix:/home/knoppix# fdisk -l /dev/sdb
Disk /dev/sdb: 931.5 GiB, 1000207286272 bytes, 1953529856 sectors
Disk model: SanDisk SSD PLUS
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x5fa89795

Device     Boot     Start        End    Sectors   Size Id Type
/dev/sdb1  *         2048  125831167  125829120    60G 83 Linux                 ← root partition
/dev/sdb2       125831168 1953529855 1827698688 871.5G  5 Extended
/dev/sdb5       125833216  159387647   33554432    16G 82 Linux swap / Solaris
/dev/sdb6       159389696 1953529855 1794140160 855.5G 83 Linux                 ← /home
root@Microknoppix:/home/knoppix# e2fsck -fy /dev/sdb6
e2fsck 1.44.5 (15-Dec-2018)
Pass 1: Checking inodes, blocks, and sizes
Inode 33692836 extent tree (at level 1) could be shorter.  Optimize? yes

Pass 1E: Optimizing extent trees
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/sdb6: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sdb6: 299071/56074240 files (1.4% non-contiguous), 54502379/224266934 blocks
root@Microknoppix:/home/knoppix# e2fsck -fy /dev/sdb1
e2fsck 1.44.5 (15-Dec-2018)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/sdb1: 309900/3909120 files (0.4% non-contiguous), 5462752/15728640 blocks
root@Microknoppix:/home/knoppix# 
Initial checks suggest that nothing important was lost. If something important had been lost, it would have been mightily inconvenient, but not really catastrophic. That said, I decided to add this device to my crashplan subscription.