Zippers, in Clojure, are one of those things you end up running into that can be rather confusing. A little googling of the topic seems to reveal a universe divided between people who understand zippers perfectly and those who don’t get them at all. Zippers are a core library, so they get mentioned in the Clojure books, but they are also a special case that would take several pages to really explain in detail.

Most of all, they are, for a lot of us, slightly strange, a solution to the problem of “traversing” a tree structure without any implied state. In this post, I’m going to try to help those who don’t immediately understand zippers to at least avoid some of the subtleties that tripped me up.

Why “zipper”?

I can’t use zip files because I don’t have a Zip drive''. Why does everything have to be called zip”? What is a zipper, anyway?

With their usual terseness, the docs simply say:

Functional hierarchical zipper, with navigation, editing, and enumeration. See Huet
— Rich Hickey
API for clojure.zip

Huet turns out to be Gérard Huet, who published Functional pearl: the Zipper in 1997. Huet insists on how simple the concept is, and almost asks forgiveness for writing a paper about something that really should have been documented long ago:

This simple idea must have been invented on numerous occasions by creative programmers, and the only justification for presenting what ought to be folklore is that it does not appear to have been published, or even well-known.
— Gérard Huet
Functional Pearl: the Zipper

Huet is unsurprisingly one of those people who just “gets it”. Here is how he explains the Zipper:

The basic idea is simple: the tree is turned inside-out like a returned glove, pointers from the root to the current position being reversed in a path structure. The current location holds both the downward current subtree and the upward path.

Now maybe this kind of spatial and logical torsion is intuitive for you. Reading this now, I actually do understand what he means, and it might be `simple'' in a Rich Hickey sense, as in `uncomplected”, but it certainly isn’t that easy to understand.

A few basic things

So here are a few things I would have liked to have understood to begin with, and that I didn’t fully grasp before going through lots of trial and error and frustration.

Zipper vs. tree vs. node

We know what a tree is. In Clojure we would represent it as something like this, with nested vectors for example:

[1 [:a :b] 2 3 [40 50 60]]

This is just a tree, not a zipper, because a zipper is not a tree: it is a tree plus your position in the tree. Or to be more precise: a tree seen from a particular location in the tree.

So to make a zipper out of that tree, we do this:

user> (require '(clojure [zip :as zip]))
nil
user> (zip/vector-zip [1 [:a :b] 2 3 [40 50 60]])
[[1 [:a :b] 2 3 [40 50 60]] nil]

Notice how the result is just the same tree wrapped in a two-element vector? No big deal, right? Can it be that simple?

Well…​ this is Huet’s glove, but still right side out. If you want to see the lining, just do this:

user> (->  [1 [:a :b] 2 3 [40 50 60]]
	   zip/vector-zip
	   zip/down
	   zip/right)
[[:a :b] {:l [1], :pnodes [[1 [:a :b] 2 3 [40 50 60]]], :ppath nil, :r (2 3 [40 50 60])}]

There’s the lining, or at least the beginning of it. All that extra stuff is the zipper looking back up toward the root of the tree, to the left and to the right. Moving around the tree means calling a function like zip/down on the current zipper, and getting back a new zipper with some of the nodes shifted around to represent your new position in the tree. This is functional programming. We are always creating new values, not modifying old values. Position in a tree is typically a form of implicit state: we just are somewhere. Here our location becomes explicit, and the zipper infrastructure makes it so we don’t have to think too much about how all of this happens. We don’t ever need to know about :pnodes or :ppath, we just deal with the zipper interface. (Imagine having to do this “by hand”, associng the left nodes over to the right nodes, dissocing them from the left, etc.)

Two forms of movement

While we are talking about movement, it’s worth pointing out that there are two ways of moving through the zipper. The first way is what we’ve just seen: zip/down, zip/up, zip/left, zip/right and functions like zip/root that take us all the way back to the top.

The second way of getting around the tree, zipper-style, is to walk, as in tree-walk. One of the added benefits of using zippers is that you get, for free, a depth-first tree traversal, just by calling zip/next as many times as you need.

Going back to our example tree, with zip/next you can do stuff like this:

;; let's make a var to be easier to read.
 user> (def zzz (zip/vector-zip [1 [:a :b] 2 3 [40 50 60]]))
#'user/zzz
user> (-> zzz zip/next zip/node)
1
user> (-> zzz zip/next zip/next zip/node)
[:a :b]
user> (-> zzz zip/next zip/next zip/next zip/node)
:a
user> (-> zzz zip/next zip/next zip/next zip/next zip/node)
:b
user> (-> zzz zip/next zip/next zip/next zip/next zip/next zip/node)

(Notice that we call zip/node at the end each time, so that we get the value and not the whole zipper. More on that in a second.)

Each time we add a call to zip/next, we get one more node. Without even trying very hard, we get a complete walk of the tree just by calling zip/next a bunch of times. The function just “knows” where to go to hit each node in the tree exactly once.

End games

Now, when walking a zipper, like in any recursive situation, we wouldn’t want to be in an infinite loop. zip/next needs to know when to quit. Which is why the Clojure zip library provides an end? function that tells us if we are there yet.

user> (defn fast-forward [loc]
	(if (zip/end? loc)
	  (zip/node loc)
	  (recur (zip/next loc))))
#'user/fast-forward
user> (fast-forward zzz)
[1 [:a :b] 2 3 [40 50 60]]

Here we’ve zipped all the way to the end, and we end up with the whole tree, that is the root node.

Check out this slighly different version of our fast-forward function:

user> (defn fast-forward-z [loc]
	(if (zip/end? loc)
	  loc
	  (recur (zip/next loc))))

#'user/fast-forward-z
user> (fast-forward-z zzz)
[[1 [:a :b] 2 3 [40 50 60]] :end]

This time we get the zipper, and not the just the node, as in the previous. And we have this strange little :end keyword as the second part of the zipper vector. The zipper has been “consumed”. We can’t walk anymore. zip/next will just keep returning the same thing, and zip/prev will just return nil. The glove has been turned so far inside out that it can’t go any more (yet, strangely, it looks right side out again).

So not only does the zipper represent our position in the tree, it also “knows” where we’ve been, or at least whether we can keep walking or not.

Nodes and locs

So what about nodes and this zip/node function? Really, it is actually fairly simple: a node is of course a tree, a subtree, a branch or a leaf. The node is your data. But it doesn’t know what it’s parents or siblings are, or how far it is from the root.

On the other hand, a location (usually pronounced loc) contains a node (the node is the first element in the zipper vector), but a location is actually the entire zipper, zipped to a certain point. Sometimes you need the node, and sometimes you need the loc. Somehow, figuring out when to look at a node and when to look at a location was one of the major conceptual hurdles that I encountered.

What tripped me up, I think, looking back, was that zippers require you to go back and forth between the zipper view of the world (that is: the whole tree seen from a particular position) and a node view of the world, where we are back in a good old tree structure looking at individual values.

Say we wanted to find the first keyword (in preorder depth-first traversal) in our tired zzz vector. We could do this:

user> (defn first-keyword [loc]
	(if (zip/end? loc)
	  nil
	  (if (keyword? (zip/node loc))
	    (zip/node loc)
	    (recur (zip/next loc)))))
#'user/first-keyword
user> (first-keyword zzz)
:a

The “keywordiness” of a particular location in the tree is distinct from its location. To see if the current node is a keyword, we have to use the zip/node function to slip into nodespace. If, on the other hand, we want to know whether our node has a right or left sibling, for example, we need to be in zipper space. If we want to find nodes of a certain kind that have parents or children of another kind, our tests have to dance between nodes and zippers. This is the kind of thing that seems completely obvious once you understand it, but, at least for me, kept me scratching my head early on.

Zippers are cheap

One last thing that took a while to sink in: zippers are cheap. Huet insists on how zippers are supposed to be fast since, in a very clojurey way, only the differences in path and position need to be taken into account, leaving most of the original data intact.

In a more concrete way, just looking at a zipper, we see that there isn’t much to them. Here is your data:

user> (def your-data [:a :b :c])
#'user/your-data

Here is your data in a zipper:

[[:a :b :c] nil]

It’s just a two-item vector. Well, almost: there is some metadata attached, so it doesn’t work to just stuff some data in a raw vector followed by nil. You have to use functions like vector-zip or xml-zip to create them.

But still, the data structure is very lightweight, especially when the zipper is at the root position of the tree. As a result, you can be creative with when you decide to break out a zipper. Suppose for instance that you know you only want to walk one subtree within a larger tree. You need a zipper to navigate down to the top of the subtree that interests you. Once you’ve found it, rather than figuring out how to constrain your search to just that subtree, it is simpler to just take the node containing the subtree and plunk it into a new zipper. Then you can zip/next to your heart’s content, knowing that you’ll never end up in some weird place you didn’t expect.

I’ll talk about “editing” zippers some other time, but I recently came onto a problem where I wanted to walk a tree several times with different functions each time. I couldn’t just hold onto a reference to the original zipper because I was modifying as I went. I need to be able to chain my zipper-walking functions together in a sequence. The problem was that after each walk, I was at the :end. My functions worked individually but couldn’t be chained together. The original zipper had been walked too far and just couldn’t take it anymore. What to do? hiredman helped me out on IRC with this: just make a new zipper. In my case that meant just calling zip/xml-zip again on the exhausted zipper.

There are some more gotchas that I’ve run into with zippers, especially when trying to edit trees, so I’ll take that on later. My feeling about zippers is that they are an extremely powerful tool that can be used in lots of unusual situations, not just parsing XML, but anywhere you have a tree that starts to get a little too complicated.

Comments