Raining and Pouring
It’s been almost a month since my last post, about our lightning damage. Between that and other goings-on, I’ve just been too busy to sit and bang anything out. (In fact, I’ve got another post percolating in my head about being busy, which seems like the start of a vicious circle.) We got all the important computer damage repaired, and I thought things were going to be back to normal. Then the second shoe dropped. (By the way, if you ever have a problem with Comcast, don’t waste your time with the call center in Chicago, unless you enjoy being lied to, insulted, and ignored. Just go down to the local office in Quincy, where the people are extremely courteous and helpful.)
(The rest of this will be painfully technical, but I’m writing it up to help others who run into the same problem, because it wasn’t easy for me to track down the answer.)
My system has two identical SATA hard drives, and when I had FreeBSD on it, they were mirrored so a hard drive failure wouldn’t take me out of action for any serious amount of time. When I installed Xubuntu, I left one of them untouched, in case I changed my mind and wanted to switch back right away, or needed to get files from it. So that second drive has been plugged in, but unused. I’ll call the Xubuntu drive A and the FreeBSD drive B.
My separate backup server wasn’t damaged by the lightning, since it wasn’t on the physical network with the cable modem, but by the time I robbed parts from it to test the other systems and get them back in order, it was non-functional, and there were enough other things going on that I didn’t get it back in operation right away. So to give myself a backup, I used ‘dd’ (a low-level data copying utility) to make an exact copy of drive A on drive B. So if drive A failed before I got my backup server running again, I could swap the drives and run off drive B. So far so good.
About a week later, I needed to reboot so some security updates could take effect, so I did that the last thing of the evening. I think it was even a Friday evening, so I didn’t discover for a couple days that my filesystem had been restored back to the state it was at when I did the dd copy! Everything I did during that week disappeared: files I created were gone, and files I deleted were back.
This was disconcerting to say the least. The kernel does do a certain amount of filesystem caching in memory, so for a moment I wondered if it had somehow stopped writing from the cache to the filesystem, but this accounted for many more gigabytes of data than my system’s RAM and swap space could ever hold. I thought maybe the drives got switched in the BIOS (the motherboard does have that as an option, and maybe the lightning made it flaky), but when I mounted the ‘second’ drive, it didn’t show the recent files either.
In fact, when I did a little testing, I discovered that all changes I made now took effect on both drives, whether the second one was mounted or not. It was as if the system recognized that the two drives were identical, and decided to make a RAID1 mirror out of them without consulting me, then rebuilt the first drive based on the contents of the second one, restoring back to the state of my dd copy. But that’s insane; just because someone has two drives identically partitioned doesn’t mean he wants a mirror. So I was really scratching my head at this point.
Someone at Ubuntu Forums gave me the answer. See, you used to be able to count on drives coming up in the same order every time. So whichever drive came up as drive A would always come up as drive A. But now, with removable drives and bootable USB devices and so on, that’s no longer always the case with some hardware. Drive A might become drive B because you plugged in something that jumped ahead of it in the order. So to make sure the same drive is always recognized as the boot drive, they’ve started assigning UUIDs to the partitions, and then the boot loader (grub) decides what partition to boot based on the UUID. (A UUID looks like this: 95db600e-46bb-4878-b193-7899c8fb95da.) So instead of booting the first partition marked bootable on drive A, it boots the partition marked 95db600e-46bb-4878-b193-7899c8fb95da (or whatever UUID it was given at installation) no matter where it is.
So when I copied drive A to drive B bit-for-bit, I was copying the UUIDs too. Since both drives now had the same UUIDs, when I did the reboot it picked the first one with the matching UUID, which unluckily happened to be drive B, the one with the one-week-old copy. Then, it appears that every time the kernel wrote to the drive, it wrote to both of them, because it decided where to write based on the UUID, and within a couple days drive A’s index blocks got overwritten by drive B’s, effectively wiping out all changes on drive A since the copy. Otherwise I could have simply mounted the (current) second drive and found the missing files there.
The fix was much simpler than the explanation: make sure both drives are current, make a good backup (by this point my backup server was up again), and assign a new UUID to one of the drives and reboot. The command was ‘tune2fs -U `uuidgen` /dev/sdb1′ to assign a new UUID to the first partition on the second drive. I don’t seem to be able to assign a UUID to the swap partition on the second drive, because swap partitions don’t have superblocks; but the system doesn’t seem to mind, so I guess I don’t either.
That’s the thing about technology: there’s always something new that you need to know. A few days ago I knew nothing about UUIDs; now I know what they are and how to use them. Tomorrow it’ll be something else. I suppose the silver lining is that I rebooted so soon instead of going months like I often would, so I only lost about a week of email and other files, and nothing too critical. It just shows the importance of having regular backups. If I’d done the dd copy daily, I would have only lost one day’s worth at most, and ditto if I’d gotten my backup server running sooner.
If you enjoyed this article, why not rate it and share it with your friends on Twitter, Facebook, or StumbleUpon?
loading...