Do what I mean, dammit. Or, why being silently “helpful” is evil.

For the last three years (i.e. before Time Machine), I have been using rsync to make incremental backups to an external FireWire disk from my trusty iBook. Now, rsync does this by backing up into a fresh location each time and referring to the previous backup to check if any data can be de-duplicated, which it does by creating hard links.
rsync -a --link-dest=$PREVIOUS_BACKUP $SOURCE $NEW_BACKUP
A wrapper script is needed because rsync doesn’t do any rotation of the backup paths (mine preserves six previous backups, which is quite enough), but overall the solution is elegant – restoration from any given backup is just a cp -pR away, due to the transparent nature of hard links. However (and this is the important bit) rsync only de-duplicates when the source file is identical to its previous backup in every way, including metadata.

As time went on and my laptop drive started getting full, I found that the backup window was getting suspiciously long for an 80GB disk. But it was the lack of space that finally drove me to buy a bigger external drive (admittedly, backups aren’t the only thing taking up GB on my FireWire farm).

Finally I ripped apart the wrapper script and dug through the previous backups. Turns out that rsync wasn’t preserving all the metadata, specifically file ownership. Google, as ever, gave me the answer:

Official Google Mac Blog: User 99, Unknown

By default, OS X silently maps all file ownership on external HFS+ disks to a special user “unknown”, while pretending to the user that he still owns the files. This is a “feature” to prevent weird permissions problems when swapping external drives between machines. It is so low-level that not even root can override it – if you try as root to chown a file on an external HFS+ filesystem, it will silently do nothing.

The upshot is that rsync didn’t believe that any backed-up files were identical to their originals, and so didn’t de-duplicate anything. Instead of six lean incrementals, I have six wasteful bulk backups, some of them incomplete because I ran out of backup window. And all the file ownerships have been lost, so a system restore would have failed spectacularly – if I had been foolish enough to try.

You can turn off this behaviour for any given disk (Finder>Drive>right-click>Get Info>uncheck “Ignore ownership…”), but it’s a bit late now.

Advertisements

3 thoughts on “Do what I mean, dammit. Or, why being silently “helpful” is evil.

  1. Ignoring permissions is a setting, yes. It’s the default, yes. But then the default is the average bod user.

    And it’s hardly the end of the world? You ran out of space? Did you report it?

    See, you’re coming at this from the point of view of a Linux head. This stuff was obvious to us Mac heads 🙂

  2. I understand perfectly why they made it so that ownership was by default ignored on removable disks. It’s a simple, elegant solution to a common problem.

    What makes me mad is that they went out of their way to hide it. If I sudo to root and specifically tell it to do something that (for whatever reason) it doesn’t want to do, it should have the common decency to tell me that it hasn’t been done.

  3. Good post. That illustrates the necessity of checking the backups from time to time. I know it’s boring and can take some time (especially if it’s a whole fs), and that’s probably why people don’t do it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s