I have a lot XML generated by .NET serializers for nested classes with values, arrays etc. Just entity and content are used.
To me, the default clojure XML-structure with :tag :content etc is rather big messy to work with, and easy to get confused if a deep object.
I use this function to create a simple intermediate representation, which I then can further refine depending on the type of the attributes.
I first parse the string or byte[] using clojure.data.xml/parse and then call keep-tag-and-contents-prepare-leafs
(defn keep-tag-and-contents-prepare-leafs
"Simplify the clj-xml structure, I am only interested in :tag and :content"
[xml]
(if (map? xml)
[(:tag xml) (keep-tag-and-contents-prepare-leafs (:content xml))]
(if (seq? xml)
(if (map? (first xml))
(for [x xml] (keep-tag-and-contents-prepare-leafs x))
(do
;; we are at the bottom of the xml
(assert (<= (count xml) 1) "Leafs should be empty or single value")
(if (empty? xml) nil (first xml)))
)
;; we should never end up here, since we do a look-a-head on the level above the leafs
(assert false))))
and I get a structure like this:
;; (pp/pprint (mutils/keep-tag-and-contents-prepare-leafs xlmeta-testdata-small))
;; [:defaultFormattings
;; ([:_columnMeta
;; ([:XLMetaColumn
;; ([:_name "Paris"]
;; [:_caption "Paris"]
;; [:_width "100"]
;; [:_hide "false"]
;; [:_input "false"]
;; [:_hideExport "false"]
;; [:_textAreaRows "0"])]
;; [:XLMetaColumn
;; ([:_name "footbill40"]
;; [:_caption "footbill40"]
;; [:_width "100"]
;; [:_hide "false"]
;; [:_input "false"]
;; [:_hideExport "false"]
;; [:_textAreaRows "0"])])]
;; [:_fmtStrings nil]
;; [:_maxHtmlColumns "50"])]
which easily can be processed further, just switch on vector? and seq?
This intermediate representation is also very compact, so easy to pprint stuff, put a quote in front, and create unit tests.