I could use some help optimizing some Common Lisp code. I am attempting to query data out of a log file. Pulling the first 50 lines out of over 14.5k lines takes over a second. Extrapolating that out, it would take almost 5 minutes just to read the data from the log file. Additionally the first 50 lines with my currently implementation allocates ~50MB, when the entire file is only 14MB. Where I want to go with this is to perform 1 read through the data to parse it with the minimum number of memory allocations.
I know the performance hit I am seeing is due to my code. What I am having a hard time wrapping my brain around is how to refactor my code to minimize the issues I am seeing. I have tried accessing the string as a stream using WITH-INPUT-FROM-STRING and the performance didn't change noticeably.
This is an IIS log, so it will have a consistent structure. The first 2 fields are date and time, which I would like parsed into a number so I can constrain the range of data when needed. After that, most of the fields will be variable in size, but all are separated by a space.
With My Code: took 1,138,000 microseconds (1.138000 seconds) to run with 8 available CPU cores. During that period, 1,138,807 microseconds (1.138807 seconds) were spent in user mode 0 microseconds (0.000000 seconds) were spent in system mode 19,004 microseconds (0.019004 seconds) was spent in GC. 49,249,040 bytes of memory allocated.
Without My Code: took 64,000 microseconds (0.064000 seconds) to run with 8 available CPU cores. During that period, 62,401 microseconds (0.062401 seconds) were spent in user mode 0 microseconds (0.000000 seconds) were spent in system mode 834,512 bytes of memory allocated.
(defun read-date-time (hit)
(let ((date-time (chronicity:parse (subseq hit 0 20))))
(encode-universal-time (chronicity:sec-of date-time)
(chronicity:minute-of date-time)
(chronicity:hour-of date-time)
(chronicity:day-of date-time)
(chronicity:month-of date-time)
(chronicity:year-of date-time))))
(defun parse-hit (hit)
(unless (eq hit :eof)
(cons (read-date-time hit)
(split-sequence:split-sequence #\Space (subseq hit 20)))))
(time (gzip-stream:with-open-gzip-file (ins "C:\\temp\\test.log.gz")
(read-line ins nil :eof)
(loop for i upto 50
do (parse-hit (read-line ins nil :eof)))))
My first attempt is a very naive approach and I recognize that my code now could use some improvement, so I am asking for some direction. If a tutorial is more a more appropriate way to answer this question, please post a link. I enjoy