1

Given a namespaced xml (ignored in this ex)

<foo>
    <name>John</name>
    <address>1 hacker way</address>
    <phone></phone>
    <school>
        <name></name>
        <state></state>
        <type></type>
    </school>
    <college>
        <name>mit</name>
        <address></address>
        <state></state>
    </college>
</foo>

how would you write a function, remove-empty-tags with clojure.data.xml to return the following?

<foo>
  <name>John</name>
  <address>1 hacker way</address>
  <college> 
    <name>mit</name>
  </college>
</foo>

My solution so far is incomplete and looks like some recursion might help:

(require '[clojure.data.xml :as xml])

(defn- child-element? [e]
  (let [content (:content e)]
    (and (= (count content)
            (count (filter #(instance? clojure.data.xml.node.Element %) content))))))


(defn remove-empty-tags
  [xml-data]
  (let [empty-tags? #(or (empty? %) (-> % .toString blank?))]
    (reduce (fn [col e]
               (if-not (empty-tags? (:content e))
                 (merge col e)
                  col)))
            xml-data))

(def body (slurp "sample.xml")) ;; the above xml
(def xml-data (-> (xml/parse (java.io.StringReader. body)) :content))

(remove-empty-tags xml-data)

This returns, after converting to xml:

<foo>
    <name>John</name>
    <address>1 hacker way</address>
    <school>
        <name/>
        <state/>
    </school>
    <college>
        <name>mit</name>
        <address/>
        <state/>
    </college>
</foo>

Clearly, this function needs to be recursive to remove empty child nodes using child-element?.

Suggestions?

2
  • Can you specify how you call the function? What is in xml-data etc... Commented Dec 5, 2018 at 3:11
  • Sure, I added the calling code above. Commented Dec 5, 2018 at 3:39

3 Answers 3

1

Here's a pretty simple solution using clojure.walk/postwalk:

(defn remove-empty-elements [xml-data]
  (clojure.walk/postwalk
   (fn [v]
     (cond
       (and (instance? clojure.data.xml.Element v)
            (every? empty? (:content v)))
       nil ;; nil-out elements with no content
       (instance? clojure.data.xml.Element v)
       (update v :content #(filter some? %)) ;; filter nils from contents
       :else v))
   xml-data))

This works by traversing the XML data depth-first, replacing elements with no :content to nil, and filtering those nils out of other elements' :content collections.

Note: the second (instance? clojure.data.xml.Element v) clause in the cond can be omitted if you're just emitting strings, because xml/emit-str ignores nils in :content collections i.e. it'll emit the same string either way.

(println (xml/emit-str (remove-empty-elements xml-data)))

Formatted output:

<?xml version="1.0" encoding="UTF-8"?>
<foo>
    <name>John</name>
    <address>1 hacker way</address>
    <college>
        <name>mit</name>
    </college>
</foo>
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for sharing this code. However, when I run this code it outputs the same input xml with no changes. Can you please confirm if this works? I have attempted to use postwalk, but it walks each keyword in the 'element' defrecord rather than whole element, and as such the logic was not straightforward to arrive at.
Note that (empty? (filter some? x)) can be better written as (every? nil? x)
Can't use nil checks only as empty strings need to be stripped; this is ideal: #(or (empty? %) (-> % .toString blank?)
@pri I used the XML from your question verbatim. The walk functions will walk every element in the input, down to the individual primitives. That’s why the cond has clauses to look at specific values, and leave everything else untouched.
1

You can easily manipulate tree-like data structures using the Tupelo Forest library. Here is a video from the 2017 Clojure Conj giving an introduction. For your problem:

  (let [xml-data "<foo>
                  <name>John</name>
                  <address>1 hacker way</address>
                  <phone></phone>
                  <school>
                      <name></name>
                      <state></state>
                      <type></type>
                  </school>
                  <college>
                      <name>mit</name>
                      <address></address>
                      <state></state>
                  </college>
                </foo> "]

We add the xml data into the a new forest and remove any whitespace nodes:

  (with-forest (new-forest)
    (let [root-hid (add-tree-xml xml-data)]
      (remove-whitespace-leaves)

with result:

(hid->hiccup root-hid) => 

    [:foo
     [:name "John"]
     [:address "1 hacker way"]
     [:phone]
     [:school [:name] [:state] [:type]]
     [:college [:name "mit"] [:address] [:state]]]

We can walk the tree and remove empty nodes like so:

      (walk-tree root-hid {:leave (fn [hid]
                                    (when (empty-leaf-hid? hid)
                                      (remove-hid hid)))})

with result:

(hid->hiccup root-hid) =>

     [:foo 
       [:name "John"]
       [:address "1 hacker way"]
       [:college 
        [:name "mit"]]]

Update

The live code can be seen here.


Update #2

If you want to run the code, you'll need something like the following in the ns form (see live code example above):

(ns tst.tupelo.forest-examples
  (:use tupelo.core tupelo.forest tupelo.test)
  ...)

9 Comments

Thanks for pointing this out. I looked at this library in the past but it has dependency nightmare with more than a dozen unrelated libs including reagent, reframe, schema, datomic, and the docs were not clear. That said, could you please edit the post with the code in a single function so I can test it out? It's hard to read code spliced with comments, when they break on copy/paste.
Got it. tupelo-forest could be an excellent standalone library with no deps.
I'm unable to resolve the symbol walk-tree inspite of (:use tupelo.core tupelo.forest).
I'm unable to download 0.9.111.
last note: the function doesn't work when used in production data as it explodes with a long stacktrace on enlive No matching method found: getBytes for class clojure.lang.PersistentArrayMap Reflector.java: 53 clojure.lang.Reflector/invokeMatchingMethod Reflector.java: 28 clojure.lang.Reflector/invokeInstanceMethod string.cljc: 279 tupelo.string$eval23128$string__GT_stream__23133$fn__23134/invoke string.cljc: 276 tupelo.string$eval23128$string__GT_stream__23133/invoke forest.cljc: 538
|
0

I was able to get to this with a combination of recursion and reduce (my original partial answer, complete). The key was to pass head of each node in recursion, so reduce can attach the transformation of child nodes to the head.

(defn- child-element? [e]
    (let [content (:content e)]
      (and (= (count content)
              (count (filter #(instance? clojure.data.xml.node.Element %) content))))))

(defn- empty-element? [e]
  (println "empty-element" e)
  (or (empty? e) (-> e .toString blank?)))

(defn element? [e]
  (and (instance? clojure.lang.LazySeq e)
       (instance? clojure.data.xml.node.Element (first e))))

(defn remove-empty-elements!
  "Remove empty elements (and child elements) in an xml"
  [head xml-data]
  (let [data (if (seq? xml-data) xml-data (:content xml-data))
        rs (reduce (fn [col e]
              (let [content (:content e)]
                (cond
                  (empty-element? content)
                  col

                  (and (not (element? content)) (not (every? empty-element? content)))
                  (merge col e)

                  (and (element? content) (every? true? (map #(empty-element? (:content %)) content)))
                  col

                  (and (child-element? content))
                  (let [_head (xml/element (:tag e) {})]
                    (merge col (remove-empty-element! _head content)))

                  :else col)))
            []
            data)]
    (assoc head :content rs)))


;; test
(remove-empty-element! xml-data (xml/element (:tag xml-data) {}))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.