0

According to the README of data.xml, the data structure returned by parse contains lazy sequences. So the XML tree returned by

(with-open [r (io/input-stream (io/file "data.xml"))]
  (xml/parse r))

may throw an exception when traversing the tree outside this code block, because the input-stream has been already closed.

What is the most elegant way to force evaluation of the whole tree before returning? I tried the following but I am wondering if there is anything simpler.

(with-open [r (io/input-stream (io/file "data.xml"))]
  (doto (xml/parse r)
    (->> (tree-seq map? :content) (dorun))))

2 Answers 2

3

Use doall to force realisation of lazy sequences, then retain just the first element. Also, you could use xml-seq rather than tree-seq:

(with-open [r (io/input-stream (io/file "data.xml"))]
  (-> r xml/parse xml-seq doall first))
Sign up to request clarification or add additional context in comments.

3 Comments

this version keeps the whole unused sequence in the memory until the function is returned.
This is a necessary cost of non-lazy sequences. Note the sequence shares its data structure with the desired tree, so the cost may not be that high. You will need to measure the costs in your application. If the memory costs are too high, you'd need to change strategy and embrace laziness rather than seek to avoid it.
@erdos If you want to participate of the laziness, you have to move the work inside your with-open.
-1

For general XML parsing, I might suggest looking at tupelo.parse.xml. It is not lazy, and simplifies loading of XML data. There are similar namespaces for parsing HTML via TagSoup, and also parsing YAML.

For generic use, you can always fallback to tupelo.core/unlazy. It will recursively walk a data structure and convert every item into a plain map, vector, set, etc. It will also realize an InputStream via slurp (very handy for testing http endpoints that return an InputStream instead of a string). It also handles java.lang.Iterable => vector. Overall, unlazy is like doall on steroids.


Aside:

If you are wondering about the map & set stuff, it is because Datomic return values behave like a "lazy map" and do not realize all of their key-value pairs unless you specifically extract each key or dump the object into a regular Clojure map via into or similar.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.