I'm reading about how lazy sequences can cause OutOfMemoryError's when using, say, loop/recur on large sequences. I'm trying to load in a 3MB file from memory to process it, and I think this is happening to me. But, I don't know if there's an idiomatic way to fix it. I tried putting in doall's, but then my program didn't seem to terminate. Small inputs work:
Small input (contents of file): AAABBBCCC Correct output: ((65 65) (65 66) (66 66) (67 67) (67 67))
Code:
(def file-path "/Users/me/Desktop/temp/bob.txt")
;(def file-path "/Users/me/Downloads/3MB_song.m4a")
(def group-by-twos
(fn [a-list]
(let [first-two (fn [a-list] (list (take 2 a-list)))
the-rest-after-two (fn [a-list] (rest (rest a-list)))
only-two-left? (fn [a-list] (if (= (count a-list) 2) true false))]
(loop [result '() rest-of-list a-list]
(if (nil? rest-of-list)
result
(if (only-two-left? rest-of-list)
(concat result (list rest-of-list))
(recur (concat result (first-two rest-of-list))
(the-rest-after-two rest-of-list))))))))
(def get-the-file
(fn [file-name-and-path]
(let [the-file-pointer
(new java.io.RandomAccessFile (new java.io.File file-name-and-path) "r")
intermediate-array (byte-array (.length the-file-pointer))] ;reserve space for final length
(.readFully the-file-pointer intermediate-array)
(group-by-twos (seq intermediate-array)))))
(get-the-file file-path)
As I said above, when I put in doalls in a bunch of places, it didn't seem to finish. How can I get this to run for large files, and is there a way to get rid of the cognitive burden of doing whatever I need to do? Some rule?
group-by-twosis really big, but it doesn't really do that much. Also(if (= (count a-list) 2) true false)is a verbose way of saying(= (count a-list) 2).