This is a follow-up to my Getting acquainted with Clojure zippers post. That was more of an introduction, this article is going to more about some of the difficulties we run into when using zippers to modify tree structures. We are going to try to write a function to modify a tree, and along we will run into lots of problems that, once we deal with them, will eventually lead to zipper enlightenment. The code here isn’t meant as recipes or a perfect example, but just a way to work through some issues that come up when working with Clojure zippers.

Immutable…​

The neat thing about zippers in Clojure is that they allow tree manipulation to be purely functional, Zipper manipulation functions can be pure : given the same zipper they will always return the same thing. This is true of movement functions like zip/down and zip/next, since position inside the tree is part of what we are feeding into the function: given the same tree, and the same position inside the tree, the result will alway be the same. The tree itself, obviously, doesn’t change just because we’ve moved. The zipper here is an abstraction that separates the tree itself from the notation of our location in the tree, and thus allows us to deal with both explicitly.

…​but editable?

What about modifying the tree, though? What happens then?

It turns out that simple edits are simple. There are several functions in clojure.zip :

  • edit

  • append-child

  • insert-child

  • insert-left

  • insert-right

  • remove

  • replace

zip/edit is interesting because its arguments are a location and a function to modify the current node. This kind of editing could go rather far, since the current node can be the top of subtree. In a simple case, though, it could look something like this:

(require '(clojure [zip :as zip]))
(def zzz (zip/vector-zip [1 [:a :b] 2 3 [40 50 60]]))
; #'user/zzz
(-> zzz zip/down (zip/edit inc) zip/root)
; [2 [:a :b] 2 3 [40 50 60]]

The first node in the tree, the 2, has been inced up from 1. Editing is possible!

In an XML based tree we might have more complex editing needs. XML trees are map-based. Each node is a map with a :tag key, an :attrs key and a :content key. If we wanted to change a <ul> to an <ol> in an HTML document, the call to zip/edit would look something like this:

(zip/edit loc #(assoc % :tag :ol))

We’ll stick to vector trees here, in the interest of simplicity and short codes snippets.

Let’s try inserting a node now. We’ll try to put a :c after the :b in the first subvector.

(-> zzz zip/down ;; we are at 1
              zip/right ;; this is the subvector
          (zip/append-child :c) ;; add as the last child
          zip/root) ;; back to the top to appreciate our work
; [1 [:a :b :c] 2 3 [40 50 60]]

zip/append-child is a good choice here because we are appending to the end of the vector, and that is what it always does. What about prepending? This time we’ll put a :aa at the beginning.

(-> zzz zip/down zip/right zip/down ;; down to the :a node
       (zip/insert-left :aa)
               zip/root)
[1 [:aa :a :b] 2 3 [40 50 60]]

insert-left: does just what it says. Let’s delete any numbers greater than 10:

(defn death-to-big-numbers
        [loc]
        (if (zip/end? loc)
          loc
          (if (or (not (integer? (zip/node loc)))
                  (< (zip/node loc) 11))
            (recur (zip/next loc)) ;; base case: keep moving
            (recur (zip/next (zip/remove loc)))))) ;; evil big numbers
(zip/root (death-to-big-numbers zzz))
; [1 [:a :b] 2 3 []] ;; they're gone!

This is our standard zipper-walking format. Most of the work is just figuring out which locations need to be removed and passing them off to zip/remove.

Immutable gotchas : let’s wrap

Now let’s try something more complicated. We’re going to start over with a new vector tree.

(def z2 (zip/vector-zip [1 2 :a :b [3 4 :c :d 5] :e]))

Our goal this time is to make sure that all keyword nodes are wrapped together in their own subtrees, with adjacent keywords wrapped together. We want to be able to take a tree like z2 and produce this:

[1 2 [:a :b] [3 4 [:c :d] 5] [:e]]

(I encountered this problem in an HTML document, trying to wrap adjacent, indented <p> elements inside <blockquote>s.)

Here we go. Strategy number one. We’ll look for keywords and, if we find one, we’ll wrap it and then start appending any right siblings.

First, let’s just try wrapping each keyword individually:

Strategy 1 - a: wrap and go…​ down
(defn strategy-one-a [loc]
        (if (zip/end? loc)
          loc
          (if (keyword? (zip/node loc))
            (recur (zip/next (zip/edit loc #(vector %))))
            (recur (zip/next loc)))))
;; wait, don't do this: infinite loop

What? Why is this infinite? Well, when we edit the node, we create a new node, [:a], and then we say zip/next. Except that the next node in this case is down…​ to that :a again! Which we wrap again, and then go find again and again and again. next is relative to the new zipper, not the old one from before the edit. This is one of those immutable/pure-function mind warp things: you want to be able to grab the new state and hold on to the old one too. But here, the old one, where zip/next was going to logically lead to the right sibling, is no longer available to us. Or, to be more precise, it is still available, but it is now a parallel universe where the edit we want to do never really happened. We can’t have the new edit and the old position at the same time, because the old position only exists in a world where the edit never happened.

Back to reality: let’s introduce some logic to make sure we don’t try to wrap keywords that are already inside vectors containing only keywords. In other words, if we run into [:x :y :z] we leave it alone.

First, we write a little function that looks up at a keyword’s parent to see if all the siblings are already keywords:

(defn parent-ok? [loc]
        (when (zip/up loc)
          (every? keyword? (zip/children (zip/up loc)))))

(Note here that zip/children returns nodes, not locs, so we can feed the results directly to keyword? without having to call zip/node.)

And then we plug it into our previous tree walking function:

Strategy 1 b
(defn strategy-one-b [loc]
        (if (zip/end? loc)
          loc
          (if (and (keyword? (zip/node loc))
                   (not (parent-ok? loc)))
            (recur (zip/next (zip/edit loc #(vector %))))
            (recur (zip/next loc)))))

; [[1 2 [:a] [:b] [3 4 [:c] [:d] 5] [:e]] :end] ;; here we see the ended-out zipper

Good: no infinite recursion, and all of our keywords are wrapped, but individually, not grouped together like we want.

Let’s try grabbing them from the left-most keyword. As soon as we hit a keyword, we’ll try to gobble up any keywords to the right before moving on.

Strategy 1 c (starting to get ugly)
(defn strategy-one-c [loc]
        (if (zip/end? loc)
          loc
          (if (and (keyword? (zip/node loc))
                   (not (parent-ok? loc))) ;; up to here, same as before
            (let [right-keys (take-while #(keyword? %) (zip/rights loc))]
                (recur (zip/next
                               (reduce (fn [z n]
                                          (zip/append-child z n)) (zip/edit loc #(vector %)) right-keys))))
            (recur (zip/next loc)))))

The basic structure is the same as before, except that now we try to incorporate keyword nodes to the right of the locations that need to be dealt with (ie. keyword nodes that are not in a parent node containing only keyword children, as established by the parent-ok? function we wrote earlier). Conveniently, the zip library provides zip/rights, which gives us a sequence of nodes (not locs, mind you) that we grab with take-while.

So far nothing extraordinary.

If we do have interesting siblings, we need to bring them into the node we are going to create. That is what we do with reduce here, and it is probably worth breaking down the logic there.

(reduce (fn [z n]
           (zip/append-child z n)) (zip/edit loc #(vector %)) right-keys)

This is the three argument form of reduce, where the second argument provides an initial value. Here our starting value is going to be the original keyword node edited to be wrapped inside a new vector. Then, for each of the keyword siblings to the right, we are going to keep calling zip/append.

It would be the equivalent of this, if we could know ahead of time that right-keys contained :a, :b and :c:

(zip/append (zip/append (zip/append (zip/edit loc #(vector %)) :a) :b) :c)

So, we gather up some more courage and we run the thing:

(strategy-one-c z2)
; [[1 2 [:a :b] [:b] [3 4 [:c :d] [:d] 5] [:e]] :end]

Well, this kind of works, in that we are getting our keywords grouped together. Obviously, we haven’t done anything to remove the nodes that we’ve copied into our aggregating nodes. And look what happens if we have more successive keyword nodes in our input:

(strategy-one-c (zip/vector-zip [1 2 :a :b :c :d :e 3]))
; [[1 2 [:a :b :c :d :e] [:b :c :d :e] [:c :d :e] [:d :e] [:e] 3] :end]

Oh dear. We can’t just leave those siblings out there, because they end up getting siblingized themselves. Before we send our big reduced, zip/appended, zip/edited zipper back to zip/next, we need to prune the original keyword siblings from the zipper. Okay, one last try.

Works but…​ (Strategy 1 d)
(defn strategy-one-d [loc]
        (if (zip/end? loc)
          loc
          (if (and (keyword? (zip/node loc))
                   (not (parent-ok? loc)))
            (let [right-keys (take-while #(keyword? %) (zip/rights loc))
                  loc-no-rights (reduce (fn [z _] (zip/remove (zip/right z))) loc right-keys)
                  loc-w-incorporated-keys (reduce (fn [z n]
                                                                (zip/append-child z n))
                                                            (zip/edit loc-no-rights #(vector %))
                                                            right-keys)]
                 (recur (zip/next loc-w-incorporated-keys)))
            (recur (zip/next loc)))))

I’ve shifted things around a little here for clarity, but it really isn’t too different from the previous rendition. Inside the let, loc-w-incorporated-keys is the result of the same reduce call. The difference here is that before building our final zipper, we remove the siblings.

zip/remove, after blowing away the node, automatically moves to whatever the previous location would be, as determined by calling zip/prev. This could be a problem if we didn’t know that we were working on right siblings, because if we removed the leftmost node, we would end up at the parent node suddenly, which could confuse things considerably. That won’t be a problem here though. This behaviour also allows us to use the combination of zip/right and zip/remove for as many times as we like, because we just keep removing the next node over each time.

loc-no-rights is the intermediary version of the zipper. And because right-keys is holding nodes and not zipper locations, we can come back to them and append them before moving on with zip/next.

Deep breath…​

(strategy-one-d z2)
; [[1 2 [:a :b] [3 4 [:c :d] 5] [:e]] :end]
(strategy-one-d (zip/vector-zip [1 2 :a :b :c :d :e 3]))
; [[1 2 [:a :b :c :d :e] 3] :end]

I confess that at this point I’m just glad to have something that works. I think, however, that you’ll agree that this is just too complicated, too messy and too painful for what should be a relatively simple operation. Going through all these contorsions was meant to show some of the problems that crop up when learning about zippers, and how we have to bend our thinking a little (or a lot) when using them. Now let’s try to do this right…​

Strategy 2

This time we are going to do everything by editing parent nodes. Most of what we do will be in what I’ve been calling “node space” and should be simpler. Each time we hit a branch node, we will look at its children, wrap any keyword nodes and then keep moving.

The first step is to write a generic wrapping function. We want it to take a list of things and group together adjacent keywords. Here is what I came up with:

(defn wrap-keywords [lst]
        (into []
              (mapcat
               #(if (keyword? (first %))
                  (list (apply vector %))
                  %)
                (partition-by keyword? lst))))
(wrap-keywords [:a :b :c 1 2 [:d :e] 3 4 :f])
; [[:a :b :c] 1 2 [:d :e] 3 4 [:f]])

The trick here is that partition-by just breaks up the initial sequence into segments depending on how the predicate reacts to each item. With the sequence in the example, it produces this:

(partition-by keyword?  [:a :b :c 1 2 [:d :e] 3 4 :f])
; ((:a :b :c) (1 2 [:d :e] 3 4) (:f))

The rest of our function, the mapcat part, just turns the keyword segments into vectors (apply vector), wraps each vector in a list to put it back on the same level as its siblings, and then concats the whole thing back together, essentially erasing the lists but leaving the vectors. And finally, we use into to box everything up in a vector.

(Note that we can use vectors here because we are using vector trees. In more serious code it would be wiser to use zip/make-node.)

Now it’s time to try this out on a tree.

Strategy 2: go after the parents
(defn strategy-two [loc]
        (if (zip/end? loc)
          loc
          (if (and (zip/branch? loc)
                      (some keyword? (zip/children loc))
                      (not (every? keyword? (zip/children loc))))
            (recur (zip/next (zip/edit loc wrap-keywords)))
            (recur (zip/next loc)))))

And…​

(strategy-two z2)
; [[1 2 [:a :b] [3 4 [:c :d] 5] [:e]] :end]
(strategy-two zzz)
; [[1 [:a :b] 2 3 [40 50 60]] :end]

In the interest of full disclosure, I messed up the first time through this and fell into the same infinite recursion trap as before. We still need to make sure that the keyword nodes aren’t already wrapped, since otherwise we’ll end up wrapping them forever. That’s what the (not (every? keyword?…)) stuff is all about.

The rest should be pretty self-explanatory by now. We check to see if we are on a parent node, using zip/branch?. We peek at the node’s children to see if any of them are keywords, and then, when the planets are correctly aligned, we edit the node as though it were a simple seq. Because zip/edit modifies the node, not the zipper.

This, in my opinion, is one of the tricky parts to zippers: knowing when to be in zipper-space or in node-space, knowing how to leverage one or the other depending on what you need to do. At the same time, having these two options, and being able to quickly switch from one to the other, is what gives Clojure zippers all their power.

Comments