Reassociating old Time Machine backups

In an attempt to get myself cheap remote backups over the internet, I bought a Raspberry Pi kit and set it up as a hackintosh Time Capsule by attaching my USB backup disk to the Pi. I however wanted to keep my existing backup history, so instead of using a fresh Linux-formatted partition (like a clever boy) I tried to get the Pi to use my existing HFS+ filesystem. Anyone interested in trying this should probably read about Linux’s flaky HFS+ user mapping and lack of journaling support first, and then back away very slowly. I blame this for all my subsequent problems.

After some effort I did get my aging Macbook to write a new backup on the Pi, but I couldn’t get it to see the existing backups on the drive. Apple uses hard links for deduplication of backups, and because remote filesystems can’t be guaranteed to support them it uses a trick. Remote backups are written not directly on the remote drive, but into a sparse disk image inside it. Thinking that it would be a relatively simple matter to move the old backups from the outer filesystem into the sparsebundle, I remounted the USB drive on the Mac (as Linux doesn’t understand sparsebundles, fair enough).

The Macbook first denied me the move, saying that the case sensitivity of the target filesystem was not correct for a backup – strange, because it had just created the sparsebundle itself moments before. Remembering the journaling hack  I performed “repair disk” on both the sparsebundle and then the physical disk itself. At this point disk utility complained that the filesystem was unrecoverable (“invalid key length”) and the physical disk would no longer mount. In an attempt to get better debug information from the repair, I ran fsck_hfs -drfy on the filesystem in a terminal. This didn’t help much with the source of the error, but I did notice that at the end it said “filesystem modified =1”. Running it again produced slightly different output, but again “filesystem modified =1”. It was doing something, so I kept going.

In the meantime, I had been looking into ways of improving the backup transfer speed over the internet. I originally planned to use a tunnel over openvpn, but this would involve channeling all backup traffic through my rented virtual server, which might not be so good for my bank account. I did some research into NAT traversal, and although the technology exists to allow direct connections between two NATed clients (libnice), I would have to write my own application around it and at this point I was getting nervous about having no backups for an extended period. I had also been working from home and getting frustrated with the bulk transfer speed between home and work, and came to the conclusion that my domestic internet connection couldn’t satisfy Time Machine’s aggressive and inflexible hourly backup schedule.

Six iterations of fsck_hfs -drfy later, the disk repair finally succeeded and the backup disk mounted cleanly. At this point, I decided a strategic retreat was in order. I went to set up Time Machine on the old disk, but it insisted that there were no existing backups, saying “last backup: none”. Alt-clicking on the TM icon in the tray and choosing “Browse Other Backup Disks” showed however that the backups were intact. While I could make new backups and browse old ones, they would not deduplicate. As I have a large number of RAW photographs to back up, this was far from ideal. There is a way to get a Mac to recognise another computer’s backups as its own (after upgrading your hardware, for example) . However, it threw “unexpectedly found no machine directories” when attempting the first step. It appeared that not only did it not recognise its own backup, it didn’t recognise it as a backup at all.

After a lot of googling at 2am, it emerged that local Time Machine backups use extended attributes on the backup folders to store information relating to (amongst other things) the identity of the computer that had made the backup. In my earlier orgy of fscking, the extended attributes on my Mac’s top backup folder had been erased. Luckily, I still had the abandoned sparsebundle backup in the trash. Inside a sparsebundle backup, the equivalent metadata is stored not as extended attributes, but in a plist file. In my case, this was in /Volumes/Backups3TB/.Trashes/501/galactica.sparsebundle/com.apple.TimeMachine.MachineID.plist, and contained amongst other bits and bobs the following nuggets:

<key>com.apple.backupd.HostUUID</key>
<string>XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX</string>
<key>com.apple.backupd.ModelID</key>
<string>MacBookPro5,1</string>

These key names were a similar format to the extended attributes on the daily subdirectories in the backup, so I applied them directly to the containing folder:

$ sudo xattr -w com.apple.backupd.HostUUID XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX /Volumes/Backups3TB/Backups.backupdb/galactica
$ sudo xattr -w com.apple.backupd.ModelID MacBookPro5,1 /Volumes/Backups3TB/Backups.backupdb/galactica

After that was fixed, I could inherit the old backups and reassociate each of the backed up volumes to their master copies:

$ sudo tmutil inheritbackup /Volumes/Backups3TB/Backups.backupdb/galactica/
$ sudo tmutil associatedisk -a / /Volumes/Backups3TB/Backups.backupdb/galactica/Latest/Macintosh\ HD/
$ sudo tmutil associatedisk -a /Volumes/WD\ 1 /Volumes/Backups3TB/Backups.backupdb/galactica/Latest/WD\ 1/

The only problem arose when I tried to reassociate the volume containing my photographs. Turns out they had never been backed up at all. They bloody well are now.


 

So what happened to my plan to run offsite backups? I bought a second Time Machine drive and will keep one plugged in at home and one asleep in my drawer in work, swapping them once a week. This is known as the bandwidth of FedEx.

Jack vs USB headphones on Mac

A work colleague recently resigned and left his USB headset on his desk. Foolishly thinking that one plug is better than two, I swapped my old jack-connector skype headset for his. Unlike the headphone jack, USB-connected headphones do not automatically disable the internal speakers on my macbook when plugged in, leading to a blast of noise across the cube farm when I tried to test them.

The problem is twofold – firstly, Mac OS treats the headphone jack and internal speakers as the same device, whereas USB headphones are naturally treated as an extra device. There is therefore no configuration option in the software to alter the behaviour – it’s buried in the BIOS for all I can tell. I got around this by downloading SoundSource – it remembers speaker and microphone configurations and restores your earlier sound settings as you plug devices in and out.

But then I came across the second problem – it only worked 50% of the time. Turns out that a single physical device appears as different virtual devices depending on what USB port it’s plugged into. I had to configure the headset to be the default mic and speakers while it was plugged into one port, then plug it into the other and repeat the process.

It’s at times like this that I appreciate how my mother feels.

Do what I mean, dammit. Or, why being silently “helpful” is evil.

For the last three years (i.e. before Time Machine), I have been using rsync to make incremental backups to an external FireWire disk from my trusty iBook. Now, rsync does this by backing up into a fresh location each time and referring to the previous backup to check if any data can be de-duplicated, which it does by creating hard links.
rsync -a --link-dest=$PREVIOUS_BACKUP $SOURCE $NEW_BACKUP
A wrapper script is needed because rsync doesn’t do any rotation of the backup paths (mine preserves six previous backups, which is quite enough), but overall the solution is elegant – restoration from any given backup is just a cp -pR away, due to the transparent nature of hard links. However (and this is the important bit) rsync only de-duplicates when the source file is identical to its previous backup in every way, including metadata.

As time went on and my laptop drive started getting full, I found that the backup window was getting suspiciously long for an 80GB disk. But it was the lack of space that finally drove me to buy a bigger external drive (admittedly, backups aren’t the only thing taking up GB on my FireWire farm).

Finally I ripped apart the wrapper script and dug through the previous backups. Turns out that rsync wasn’t preserving all the metadata, specifically file ownership. Google, as ever, gave me the answer:

Official Google Mac Blog: User 99, Unknown

By default, OS X silently maps all file ownership on external HFS+ disks to a special user “unknown”, while pretending to the user that he still owns the files. This is a “feature” to prevent weird permissions problems when swapping external drives between machines. It is so low-level that not even root can override it – if you try as root to chown a file on an external HFS+ filesystem, it will silently do nothing.

The upshot is that rsync didn’t believe that any backed-up files were identical to their originals, and so didn’t de-duplicate anything. Instead of six lean incrementals, I have six wasteful bulk backups, some of them incomplete because I ran out of backup window. And all the file ownerships have been lost, so a system restore would have failed spectacularly – if I had been foolish enough to try.

You can turn off this behaviour for any given disk (Finder>Drive>right-click>Get Info>uncheck “Ignore ownership…”), but it’s a bit late now.

The history meme

Spreading the meme

serenity:~ andrewg$ history | awk ‘{a[$2]++} END {for(i in a)print a[i] ” ” i}’ | sort -rn | head -10
108 telnet
63 rscreen
43 ping
24 sudo
24 ssh
20 host
16 scp
16 more
14 ls
13 xdvi

Hm. I seem to be using the command line mainly as a gateway into remote systems – which reflects my average working day. The stray ‘xdvi’ is due to my recent heavy use of TextMate  to write a paper in LaTeX. Not sure why I’ve been using sudo so much on my Mac though.

Similarly, on my work Linux laptop:

andgal@nbgal185:~$ history | awk ‘{a[$2]++} END {for(i in a)print a[i] ” ” i}’ | sort -rn | head -10
68 rscreen
52 ping
45 host
36 sudo
35 rdesktop
32 xrandr
21 startmenu
20 ssh
18 ifconfig
15 telnet

rscreen is merely a wrapper for ssh:

function rscreen() { /usr/bin/ssh -t $1 ‘screen -dr || /usr/bin/screen || /bin/bash’; }

and startmenu is a cool but dodgy hack to get into my windows virtual machine:

alias startmenu=’nohup rdesktop -A -s “c:\program files\seamlessrdp\seamlessrdpshell.exe explorer.exe” 192.168.185.128 -u andgal -p xxxxxxxx&

xrandr reflects the fact that I have to configure my dual-screen setup by hand after each boot under Ubuntu 7.10, as the GUI configurator just Doesn’t Work. I had to test this many many times. Apparently the latest Ubuntu beta fixes most of these problems.