2

I am fetching rows from a PostgreSQL DB using Ruby. This is done in single row mode as described on the pg gem site (https://deveiate.org/code/pg/PG/Result.html)

conn.send_query( "first SQL query" )
conn.set_single_row_mode
conn.get_result.stream_each do |row|
    # do something with the received row of the first query
end

I get all rows of the result set separately as expected. However for large result sets Ruby seems to keep them in memory. This makes the Ruby program terminate as it runs out of memory.

Is there a way to free the space of already processed rows? I think I should use clear() or autoclear? but I am not sure how to use it and what exactly to clear.

6
  • Maybe it's not postgres that keeps rows in memory, but your code? Commented Jun 29, 2017 at 20:25
  • I am pretty sure it isn't postgres. I think it is either my code or the pg gem Commented Jun 29, 2017 at 21:12
  • I guess the question is wether it is the pg gem and I can use some alternative or if the gem is alright and my cide is wrong Commented Jun 29, 2017 at 21:14
  • Be aware that the pg gem is used a lot, so I would be surprised if it's actually that one Commented Jun 30, 2017 at 5:11
  • @maax: yeah, that's what I meant. Why do you think it's pg gem? Commented Jun 30, 2017 at 5:43

2 Answers 2

3
+50

Try to use find_each approve (add LIMIT and OFFSET)

limit  = 1000
offset = 0

while do
  conn.send_query("SELECT * FROM users LIMIT #{limit} OFFSET #{offset}")
  conn.set_single_row_mode
  records = conn.get_result

  if records.any?
    limit  += 1000
    offset += limit
    sleep(5)
  else
    break
  end
  records.stream_each do |row|
    # do something with the received row of the first query
  end
end
Sign up to request clarification or add additional context in comments.

7 Comments

This would work, the only problem i am facing is that postgres reorders rows which will give me duplicate rows and i will miss some rows
I guess my question is if i can use ruby to do it without using offset and limit (not necessarily with the pg gem - maybe there is another one i don't know of?) or if i have to change my postgres setup to work with limit and offset
@maax You can order your records by adding ORDER BY id. Your memory leak appears in ruby code (not in postgres server). Because Ruby has heavy Garbage Collector. This is suggested approach by Rails team to work with possible infinite records collection. You can check how it was solved by them there: github.com/rails/rails/blob/master/activerecord/lib/…
Does ORDER BY id not carry a huge overhead for a large table or am I mistaken?
Alright, this got me on the right track. I now use an indexed id field in postgres to always have the same order of elemets. Then I re-start the ruby script for each round of your while do loop. I do this because yor solution still had the same memory leak as my original one. I have no idea why though. Comments on why this still leaks are very welcome.
|
0

You have to call clear() unless autoclear? option returns true, if not it would lead to memory leakage. You may also want to clear manually if you have got a large enough result set.

Kindly have a check with this link

https://deveiate.org/code/pg/PG/Result.html

Good Luck!

2 Comments

Won't that clear the whole result? Past and future rows?
It clears everything as far as I can tell. And that is exactly my problem as my result is too big to fit in memory and I can't split it into seperate queries.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.