4

I have this function that reproduces my problem:

(defn my-problem
  [preprocess count print-freq]
  (doseq [x (preprocess (range 0 count))] 
    (when (= 0 (mod x print-freq)) 
      (println x))))

Everything works fine when I call it with identity function like this :

(my-problem identity 10000000 200000)
;it prints 200000,400000 ... 9800000 just as it should

When I call it with seque function I get OutOfMemoryError :

(my-problem #(seque 5 %) 10000000 200000)
;it prints numbers up to 2000000 and then it throws OutOfMemoryException

My understanding is that seque function should just split the processing into two threads using ConcurrentBlockingQueue with max size 5 (in this case). I don't understand where the memory leak is.

0

1 Answer 1

6

The way seque is implemented, if you consume elements much more quickly than you can produce them, a large number of agent tasks will pile up in the queue used internally by seque (up to one task per element in the sequence). In theory what you're doing should be fine, but in practice it doesn't really work out. You should be able to see the same effect just by running (dorun (seque (range))).

You can also use the function sequeue in flatland/useful, which makes tradeoffs that are different from the ones in clojure.core. Read the docstring carefully, but I think it would work well for your situation.

Sign up to request clarification or add additional context in comments.

6 Comments

I'm puzzled by this answer. I've never had much cause to look into seque, but I thought the idea was that the consumer would be blocked if it got ahead of the producer, not heaping up requests. To me it seems more plausible the memory error is do to the huge range being realized (reference held in a closure in seque).
That's the idea, but if you read through the implementation (and I've spent a lot of time doing so, to track down related bugs), you'll see that this does happen: drain adds an agent task every time an item is consumed, and fill allows a single agent task to add many items, if the queue is not full. Eventually, there can be as many agent tasks pending as there were items in the input sequence.
@A.Webb, if you want more convincing evidence, try running (dorun (seque (range))) and then using jmap -histo <pid> to see what's taking up all the heap space. It's all hash-maps and agent actions - very little of the space is lazy seq objects.
I'll take your word, just surprised.
@lnostdal There have been no relevant changes made to the source since I wrote this answer, and (dorun (seque (range))) still OOMs. Feel free to file a bug and possibly offer a patch yourself; I don't participate in Clojure's contribution process anymore.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.