Clojure Zippers: Structure Editing With Your Mind
This is a follow-up to my Getting acquainted with Clojure zippers post. That was more of an introduction, this article is going to more about some of the difficulties we run into when using zippers to modify tree structures. We are going to try to write a function to modify a tree, and along we will run into lots of problems that, once we deal with them, will eventually lead to zipper enlightenment. The code here isn’t meant as recipes or a perfect example, but just a way to work through some issues that come up when working with Clojure zippers.
The neat thing about zippers in Clojure is that they allow tree
manipulation to be purely functional, Zipper manipulation functions
can be pure : given the same zipper they will always return the same
thing. This is true of movement functions like
zip/next, since position inside the tree is part of what we are
feeding into the function: given the same tree, and the same position
inside the tree, the result will alway be the same. The tree itself,
obviously, doesn’t change just because we’ve moved. The zipper here is
an abstraction that separates the tree itself from the notation of our
location in the tree, and thus allows us to deal with both explicitly.
What about modifying the tree, though? What happens then?
It turns out that simple edits are simple. There are several functions in clojure.zip :
zip/edit is interesting because its arguments are a location and a
function to modify the current node. This kind of editing could go
rather far, since the current node can be the top of
subtree. In a simple case, though, it could look something like this:
(require '(clojure [zip :as zip])) (def zzz (zip/vector-zip [1 [:a :b] 2 3 [40 50 60]])) ; #'user/zzz (-> zzz zip/down (zip/edit inc) zip/root) ; [2 [:a :b] 2 3 [40 50 60]]
The first node in the tree, the
2, has been
inced up from
1. Editing is possible!
In an XML based tree we might have more complex editing needs. XML
trees are map-based. Each node is a map with a
:tag key, an
key and a
:content key. If we wanted to change a
<ul> to an
in an HTML document, the call to
zip/edit would look something like
(zip/edit loc #(assoc % :tag :ol))
We’ll stick to vector trees here, in the interest of simplicity and short codes snippets.
Let’s try inserting a node now. We’ll try to put a
:c after the
in the first subvector.
(-> zzz zip/down ;; we are at 1 zip/right ;; this is the subvector (zip/append-child :c) ;; add as the last child zip/root) ;; back to the top to appreciate our work ; [1 [:a :b :c] 2 3 [40 50 60]]
zip/append-child is a good choice here because we are appending to
the end of the vector, and that is what it always does. What about
prepending? This time we’ll put a
:aa at the beginning.
(-> zzz zip/down zip/right zip/down ;; down to the :a node (zip/insert-left :aa) zip/root) [1 [:aa :a :b] 2 3 [40 50 60]]
insert-left: does just what it says. Let’s delete any numbers
greater than 10:
(defn death-to-big-numbers [loc] (if (zip/end? loc) loc (if (or (not (integer? (zip/node loc))) (< (zip/node loc) 11)) (recur (zip/next loc)) ;; base case: keep moving (recur (zip/next (zip/remove loc)))))) ;; evil big numbers (zip/root (death-to-big-numbers zzz)) ; [1 [:a :b] 2 3 ] ;; they're gone!
This is our standard zipper-walking format. Most of the work is just
figuring out which locations need to be removed and passing them off
Immutable gotchas : let’s wrap
Now let’s try something more complicated. We’re going to start over with a new vector tree.
(def z2 (zip/vector-zip [1 2 :a :b [3 4 :c :d 5] :e]))
Our goal this time is to make sure that all keyword nodes are wrapped
together in their own subtrees, with adjacent keywords wrapped
together. We want to be able to take a tree like
z2 and produce
[1 2 [:a :b] [3 4 [:c :d] 5] [:e]]
(I encountered this problem in an HTML document, trying to wrap
<p> elements inside
Here we go. Strategy number one. We’ll look for keywords and, if we find one, we’ll wrap it and then start appending any right siblings.
First, let’s just try wrapping each keyword individually:
(defn strategy-one-a [loc] (if (zip/end? loc) loc (if (keyword? (zip/node loc)) (recur (zip/next (zip/edit loc #(vector %)))) (recur (zip/next loc))))) ;; wait, don't do this: infinite loop
What? Why is this infinite? Well, when we edit the node, we create a
[:a], and then we say
zip/next. Except that the
node in this case is down… to that
:a again! Which we wrap again,
and then go find again and again and again.
next is relative to the
new zipper, not the old one from before the edit. This is one of
those immutable/pure-function mind warp things: you want to be able to
grab the new state and hold on to the old one too. But here, the old
zip/next was going to logically lead to the right
sibling, is no longer available to us. Or, to be more precise, it is
still available, but it is now a parallel universe where the edit we
want to do never really happened. We can’t have the new edit and the
old position at the same time, because the old position only exists in
a world where the edit never happened.
Back to reality: let’s introduce some logic to make sure we don’t try
to wrap keywords that are already inside vectors containing only
keywords. In other words, if we run into
[:x :y :z] we leave it alone.
First, we write a little function that looks up at a keyword’s parent to see if all the siblings are already keywords:
(defn parent-ok? [loc] (when (zip/up loc) (every? keyword? (zip/children (zip/up loc)))))
(Note here that
zip/children returns nodes, not
locs, so we can
feed the results directly to
keyword? without having to call
And then we plug it into our previous tree walking function:
(defn strategy-one-b [loc] (if (zip/end? loc) loc (if (and (keyword? (zip/node loc)) (not (parent-ok? loc))) (recur (zip/next (zip/edit loc #(vector %)))) (recur (zip/next loc))))) ; [[1 2 [:a] [:b] [3 4 [:c] [:d] 5] [:e]] :end] ;; here we see the ended-out zipper
Good: no infinite recursion, and all of our keywords are wrapped, but individually, not grouped together like we want.
Let’s try grabbing them from the left-most keyword. As soon as we hit a keyword, we’ll try to gobble up any keywords to the right before moving on.
(defn strategy-one-c [loc] (if (zip/end? loc) loc (if (and (keyword? (zip/node loc)) (not (parent-ok? loc))) ;; up to here, same as before (let [right-keys (take-while #(keyword? %) (zip/rights loc))] (recur (zip/next (reduce (fn [z n] (zip/append-child z n)) (zip/edit loc #(vector %)) right-keys)))) (recur (zip/next loc)))))
The basic structure is the same as before, except that now we try to
incorporate keyword nodes to the right of the locations that need to
be dealt with (ie. keyword nodes that are not in a parent node
containing only keyword children, as established by the
function we wrote earlier). Conveniently, the
zip library provides
zip/rights, which gives us a sequence of nodes (not
you) that we grab with
So far nothing extraordinary.
If we do have interesting siblings, we need to bring them into the
node we are going to create. That is what we do with
and it is probably worth breaking down the logic there.
(reduce (fn [z n] (zip/append-child z n)) (zip/edit loc #(vector %)) right-keys)
This is the three argument form of reduce, where the second argument
provides an initial value. Here our starting value is going to be the
original keyword node edited to be wrapped inside a new vector. Then,
for each of the keyword siblings to the right, we are going to keep
It would be the equivalent of this, if we could know ahead of time
(zip/append (zip/append (zip/append (zip/edit loc #(vector %)) :a) :b) :c)
So, we gather up some more courage and we run the thing:
(strategy-one-c z2) ; [[1 2 [:a :b] [:b] [3 4 [:c :d] [:d] 5] [:e]] :end]
Well, this kind of works, in that we are getting our keywords grouped together. Obviously, we haven’t done anything to remove the nodes that we’ve copied into our aggregating nodes. And look what happens if we have more successive keyword nodes in our input:
(strategy-one-c (zip/vector-zip [1 2 :a :b :c :d :e 3])) ; [[1 2 [:a :b :c :d :e] [:b :c :d :e] [:c :d :e] [:d :e] [:e] 3] :end]
Oh dear. We can’t just leave those siblings out there, because they
end up getting siblingized themselves. Before we send our big
zip/edited zipper back to
zip/next, we need to prune the original keyword siblings from the
zipper. Okay, one last try.
(defn strategy-one-d [loc] (if (zip/end? loc) loc (if (and (keyword? (zip/node loc)) (not (parent-ok? loc))) (let [right-keys (take-while #(keyword? %) (zip/rights loc)) loc-no-rights (reduce (fn [z _] (zip/remove (zip/right z))) loc right-keys) loc-w-incorporated-keys (reduce (fn [z n] (zip/append-child z n)) (zip/edit loc-no-rights #(vector %)) right-keys)] (recur (zip/next loc-w-incorporated-keys))) (recur (zip/next loc)))))
I’ve shifted things around a little here for clarity, but it really
isn’t too different from the previous rendition. Inside the
loc-w-incorporated-keys is the result of the same
reduce call. The
difference here is that before building our final zipper, we remove
zip/remove, after blowing away the node, automatically
moves to whatever the previous location would be, as determined by
zip/prev. This could be a problem if we didn’t know that we
were working on right siblings, because if we removed the leftmost
node, we would end up at the parent node suddenly, which could confuse
things considerably. That won’t be a problem here though. This
behaviour also allows us to use the combination of
zip/remove for as many times as we like, because we just keep
removing the next node over each time.
loc-no-rights is the intermediary version of the zipper. And because
right-keys is holding nodes and not zipper locations, we can come
back to them and append them before moving on with
(strategy-one-d z2) ; [[1 2 [:a :b] [3 4 [:c :d] 5] [:e]] :end] (strategy-one-d (zip/vector-zip [1 2 :a :b :c :d :e 3])) ; [[1 2 [:a :b :c :d :e] 3] :end]
I confess that at this point I’m just glad to have something that works. I think, however, that you’ll agree that this is just too complicated, too messy and too painful for what should be a relatively simple operation. Going through all these contorsions was meant to show some of the problems that crop up when learning about zippers, and how we have to bend our thinking a little (or a lot) when using them. Now let’s try to do this right…
This time we are going to do everything by editing parent nodes. Most of what we do will be in what I’ve been calling “node space” and should be simpler. Each time we hit a branch node, we will look at its children, wrap any keyword nodes and then keep moving.
The first step is to write a generic wrapping function. We want it to take a list of things and group together adjacent keywords. Here is what I came up with:
(defn wrap-keywords [lst] (into  (mapcat #(if (keyword? (first %)) (list (apply vector %)) %) (partition-by keyword? lst)))) (wrap-keywords [:a :b :c 1 2 [:d :e] 3 4 :f]) ; [[:a :b :c] 1 2 [:d :e] 3 4 [:f]])
The trick here is that
partition-by just breaks up the initial
sequence into segments depending on how the predicate reacts to each
item. With the sequence in the example, it produces this:
(partition-by keyword? [:a :b :c 1 2 [:d :e] 3 4 :f]) ; ((:a :b :c) (1 2 [:d :e] 3 4) (:f))
The rest of our function, the
mapcat part, just turns the keyword
segments into vectors (
apply vector), wraps each vector in a list to
put it back on the same level as its siblings, and then concats the
whole thing back together, essentially erasing the lists but leaving
the vectors. And finally, we use
into to box everything up in a
(Note that we can use vectors here because we are using vector
trees. In more serious code it would be wiser to use
Now it’s time to try this out on a tree.
(defn strategy-two [loc] (if (zip/end? loc) loc (if (and (zip/branch? loc) (some keyword? (zip/children loc)) (not (every? keyword? (zip/children loc)))) (recur (zip/next (zip/edit loc wrap-keywords))) (recur (zip/next loc)))))
(strategy-two z2) ; [[1 2 [:a :b] [3 4 [:c :d] 5] [:e]] :end] (strategy-two zzz) ; [[1 [:a :b] 2 3 [40 50 60]] :end]
In the interest of full disclosure, I messed up the first time through
this and fell into the same infinite recursion trap as before. We
still need to make sure that the keyword nodes aren’t already wrapped,
since otherwise we’ll end up wrapping them forever. That’s what the
(not (every? keyword?…)) stuff is all about.
The rest should be pretty self-explanatory by now. We check to see if
we are on a parent node, using
zip/branch?. We peek at the node’s
children to see if any of them are keywords, and then, when the
planets are correctly aligned, we edit the node as though it were a
zip/edit modifies the node, not the zipper.
This, in my opinion, is one of the tricky parts to zippers: knowing when to be in zipper-space or in node-space, knowing how to leverage one or the other depending on what you need to do. At the same time, having these two options, and being able to quickly switch from one to the other, is what gives Clojure zippers all their power.