5

Let's consider a Clojure Spec regexp for hiccup syntax

(require '[clojure.spec :as spec])

(spec/def ::hiccup
  (spec/cat :tag        keyword?
            :attributes (spec/? map?)
            :content    (spec/* (spec/or :terminal string?
                                         :element  ::hiccup))))

which works splendidly

(spec/conform ::hiccup [:div#app [:h5 {:id "loading-message"} "Connecting..."]])
; => {:tag :div#app, :content [[:element {:tag :h5, :attributes {:id "loading-message"}, :content [[:terminal "Connecting..."]]}]]}

until you try to generate some example data for your functions from the spec

(require '[clojure.spec.gen :as gen])
(gen/generate (spec/gen ::hiccup))
; No return value but:
; 1. Unhandled java.lang.OutOfMemoryError
;    GC overhead limit exceeded

Is there a way to rewrite the spec so that it produces a working generator? Or do we have to attach some simplified generator to the spec?

1 Answer 1

3

The intent of spec/*recursion-limit* (default 4) is to limit recursive generation such that this should work. So either that's not working properly in one of the spec impls (* or or), or you are seeing rapid growth in something else (like map? or the strings). Without doing some tinkering, it's hard to know which is the problem.

This does generate (a very large example) for me:

(binding [spec/*recursion-limit* 1] (gen/generate (spec/gen ::hiccup)))

I do see several areas where the cardinalities are large even in that one example - the * and the size of the generated attributes map?. Both of those could be further constrained. It would be easiest to break these parts up further into more fine-grained specs and supply override generators where necessary (the attribute map could just be handled with map-of and :gen-max).

Sign up to request clarification or add additional context in comments.

2 Comments

So I limited :attributes to be a (spec/? (spec/map-of keyword? string? :gen-max 1)). With that, setting spec/*recursion-limit* to 3 brings generation time to 1-5s. But if we increase the limit to 4 we're back to endless evaluation until the GC overhead limit is exceeded. How would you recommend limiting how large the generated data of *grows? Assuming I'm correct in believing that's the issue.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.