0

I am trying to read a file (tab or csv file) in java with roughly 3m rows; have also added the virtual machine memory to -Xmx6g. The code works fine with 400K rows for tab separated file and slightly less for csv file. There are many LinkedHashMaps and Vectors involved that I try to use System.gc() after every few hundred rows in order to free memory and garbage values. However, my code gives the following error after 400K rows.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

at java.util.Vector.<init>(Vector.java:111)
at java.util.Vector.<init>(Vector.java:124)
at java.util.Vector.<init>(Vector.java:133)
at cleaning.Capture.main(Capture.java:110)
4
  • 7
    System.gc() calls are wasted effort. You may freely remove them. Commented Nov 6, 2013 at 19:35
  • 4
    Is it time to use a database? Commented Nov 6, 2013 at 19:35
  • 1
    You may want to rethink your approach for processing this amount of data, don't try to load everything in memory. You could try to process it chunk wise (down to line by line). - What you have implemented seems to be everything but scalable. Commented Nov 6, 2013 at 19:40
  • 1
    stackoverflow.com/questions/14037404/… stackoverflow.com/questions/2356137/read-large-files-in-java Commented Nov 6, 2013 at 19:50

1 Answer 1

4

Your attempt to load the whole file is fundamentally ill-fated. You may optimize all you want, but you'll just be pushing the upper limit slightly higher. What you need is eradicate the limit itself.

There is a very negligible chance that you actually need the whole contents in memory all at once. You probably need to calculate something from that data, so you should start working out a way to make that calculation chunk by chunk, each time being able to throw away the processed chunk.

If your data is deeply intertwined, preventing you from serializing your calculation, then the reasonable recourse is, as HovercraftFOE mentions above, transfering the data into a database and work from there, indexing everything you need, normalizing it, etc.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.