It was time for me to figure out a new backup system. To date, I’ve tried :

System Result

No backups

Disaster

tar with tape drive

Could never quite get used to tapes

rsync + hardlinks

Worked well [1]

Bacula

Configuration became an issue

Time Machine

Filesystem corruption

There are three boxes in the house that need to be backed up, two running Debian and one Macbook. This time around I wanted a solution that might be able to back data up to something “cloud”, so I started looking around. I was about to install Duplicity when I stumbled onto a completely different kind of tool, called ‘Ugarit’.

What is Ugarit?

Ugarit is an outlier among backup systems that adopts a rather novel approach to backups. It appears to be almost entirely unknown outside of the Chicken Scheme world. There is very little information about it out there, and I almost passed it by because it seemed so obscure. Well, it ‘is’ obscure really. There is no chance of finding people on Stack Overflow who have already asked the same question you want to ask. I was about to move on to the next option in my list of backup software, but the utter coolness of Ugarit got to me.

The coolness of Ugarit is its conceptual clarity. The basic pattern seems perfectly adapted to the problem domain, in such a way that many “features” are just the natural consequences of the initial conceptual choices.

Did you say obscure?

From a sort of aesthetic point of view, Ugarit is intriguing. It seems to be the work of someone (smart) who wants to do things ‘their’ way.

  • It is written in Chicken Scheme. This alone could be frightening for some. I’ve never programmed in Scheme, but the fact that this was written in Lisp was certainly not going to scare me away. Chicken Scheme compiles to C and runs on all the major OS’s, so this is actually a fairly nice choice of platforms.

  • The licence appears to be very “Free as in speech”, but is neither the GPL, nor the BSD nor any other known or named licence.

  • The author uses a version control system I had never heard of, Fossil, which is apparently Git-like but with added features (bug-tracking and wiki are built in).

But…​ the documentation is fairly clear and well written, and provides enough details to get going.

Content Addressed Storage

Ugarit is a backup system based on the notion of fuzzy:backup[Content Addressed Storage]. I won’t try to give a deep, technical explanation. The project documentation gives a pretty good overview, and I’ll come back to that. Basically, though, what happens with Content Addressed Storage, is that each of your files gets hashed and then stored at a location based on its hash. If your file hashes to “xyz123”, you would find it in the “123” subfolder of the “xyz” directory. The win, for backup, with this, is that if a file doesn’t change, its hash doesn’t change either, which means that there is no longer any point in storing multiple copies of the same file every month, or however often you happen to decide to do a full backup.

Because that is one of the problems with the traditional backup plan, where full backups need to happen every so often, due to the inherent fragility of chaining together too many incremental backups. As the Ugarit docs point out, most of our current backup concepts come from the world of magnetic tapes. Content Addressed Storage takes advantage of the possibilites of the low cost and easy access of live, spinning disks.

Another advantage with the Content Addressed Storage pattern is that duplicate data has the same hash. In other words, two (or more) identical files at different places in the filesystem, hash to the same thing and so get stored in exactly the same place. If multiple hosts are being backed up to the same store, they all benefit from this. If the same config file is present on 10 different machines, Ugarit only stores it once.

The difference between this kind of backup and traditional backups is similar to the difference between Git and Subversion (or Mercurial and CVS or whatever). In Git, there is no equivalent of a “complete file set”: everything is a patch. The system knows how to put your data back together when you ask it too.

libchop: a similar system

There is another tool in the same space, called libchop, or, more specifically, chop-backup. I almost chose libchop over ugarit at one point, but ultimately decided against it, first of all because the project seems more or less dormant, and because the backup process seems to be slightly more complicated. (You need to store a key for each snapshot, so the encryption is more secure, but you have the hassle of keeping track of snapshot keys.)

Installing

The documentation is quite good, and there is also a tutorial that walks you the relatively simple process which I won’t go over here in detail. Essentially, there are four steps:

  • Install Chicken Scheme

  • Install, via the Chicken Scheme installer, a couple of hashing libs and ugarit itself.

  • Create a directory somewhere for the “vault”

  • Make a ugarit.conf file (one for each client) with it’s own salt and key.

And that’s all.

The vault

The vault is just a directory. Once the data starts pouring in, it will end up containing a lot of directories whose names are hex numbers, from 0 to FFF. They contain similarly named subdirectories, and eventually we find some data. This is how it works: the hash of the file is converted into a path to that file.

About that key and that salt

The salt the user provides in the ugarit.conf file is what ugarit will use to hash the files it is backing up. That is straightforward enough. But what is that key for?

Encryption, of course. This was one of the things I was looking for in a backup system: simple encryption so that I could put my data in uncontrolled places, like the “cloud”. Ugarit doesn’t force you to encrypt your data, but it does offer the possibility right out of the box.

So far, I have just used this setup for storing to a local server, but my plan is to have a second data store. At this point, something like Dropbox might actually be perfect match for Ugarit. That would just mean having a “vault” inside a local Dropbox (or whatever) directory. Ugarit is efficient in terms of both network traffic (changes are the only things that get uploaded) and total storage space (nothing is duplicated). I haven’t tried it yet, but this seems like a perfect fit.

Alaric Snell-Pym has mentioned a future S3 backend. That would be even nicer.

Over the wire

As I mentioned, my current plan with Ugarit is to just get it saving files on my local backup server. This does imply getting data from the laptops up to the server upstairs.

I wanted to avoid having to setup NFS shares just for this. In my early rsync + hardlinks setup, the clients had scripts, run through cron, that tested for network connectivity, mounted the NFS share, rsynced, and unmounted. It worked but was fragile. Part of the reason I switched to Bacula was to solve this problem.

Ugarit is designed to work over ssh and does it very well. Here is some example config to show what I mean:

This is a line in the ugarit.conf config file on a client that defines the vault it is going to use, which in this case would be a directory on the same machine.

  (storage "backend-fs fs /backup/vault")

For a remote vault, all you do is this:

  (storage "ssh remote-host 'backend-fs fs /backup/vault'")

Ugarit still needs to be installed on the remote host, but you don’t even need a remote config file. And that is all!

Ugarit has an ‘explore’ feature that lets you browse through the snapshots in your vault, and through the files in each snapshot. With a remote backup server, this feature still works completely seamlessly over ssh. It’s very nice, considering that all it takes is a single line of configuration.

So far so good…​

I’ve been using this setup for a couple of weeks now, backing up two laptops, one Debian, one OS X, to a local server. Incremental backups are fairly fast, though not instantaneous, since Ugarit does have to look at a lot of files to decide which ones to back up.

My next step will be to automate the process, taking into account the fact that laptops are constantly going to sleep or offline. I’ll also have to figure out OS X’s equivalent of cron…​

After that I will take a look at the alternate backend that ships with Ugarit, the logfile backend, that stores data in fixed size blocks (like 1GB) and keeps track of files in a separate sqlite database. This might be an interesting way to store data on something like S3 while we wait for Ugarit’s author to write that S3 backend.

And, like I mentioned, I would like try Ugarit on Dropbox or something Dropbox-ish. So anyway, lots of fun so far, but more to do.


1 I can’t really remember why I didn’t just go back to the rsync and hardlinks method, which really is a rather nice, efficient system. When I set that up, it was entirely homespun, just based on some examples I’d seen. I think I probably wanted something more enterprisey.

Comments