1

I'm trying to get the data from a text file into a HashMap. The text-file has the following format:

enter image description here

it has something like 7 million lines... (size: 700MB)

So what I do is: I read each line, then I take the fields in green and concatenate them into a string which will the HashMap key. The Value will be the fild in red.

everytime I read a line I have to check in the HashMap if there is already an entry with such key, if so, I just update the value summing the value with the red; If not, a new entry is added to the HashMap.

I tried this with text-files with 70.000 lines, and it works quite well.

But now with the 7 Million line text-file I get a "java heap space" issue, like in the image:

enter image description here

Is this due to the HashMap ? Is it possible to optimize my algorithm ?

6
  • To store 700 MB of text you will need at least 1.4 GB possibly closer to 3 GB with overhead and the HashMap. How much memory do you have? Commented Oct 25, 2012 at 19:51
  • Note that you don't need to check if there is an entry first, if the entry is already in the HashMap it will be replaced by the new value. Commented Oct 25, 2012 at 19:54
  • I just changed my netbeans.conf to -J-Xms500M -J-XX:PermSize=1500M I'll try like this and check it out... By the way I have 4GB of RAM Commented Oct 25, 2012 at 20:02
  • So I did this:-J-Xms600M -J-XX:PermSize=1600M...but still have the heap overflow, and If I add more to the Xms, Netbeans cannot start... :( Commented Oct 25, 2012 at 20:37
  • What do you need this large amount of data for? Huge data is best managed by reading in parts, and storing only small relevant parts in memory. You could create an API to read only required small parts of the huge data while keeping the rest away in disk. You can have the API return HashMaps for these small parts, then use the HashMap. If you want to read and process the whole file, repeatedly ask for HashMap from API in file sections, discarding already processed sections from memory. Commented Oct 25, 2012 at 20:44

2 Answers 2

3

You should increase your heap space

-Xms<size>        set initial Java heap size
-Xmx<size>        set maximum Java heap size

java -Xms1024m -Xmx2048m

A nice read From Java code to Java heap

Table 3. Attributes of a HashMap
Default capacity                     16 entries
Empty size                           128 bytes
Overhead                             64 bytes plus 36 bytes per entry
Overhead for a 10K collection   ~    360K
Search/insert/delete performance    O(1) — Time taken is constant time, regardless of the number of elements (assuming no hash collisions)

If you consider above table overhead for 7 Million records come to around 246 MB so your minimum heap size must be around 1000 MB

Sign up to request clarification or add additional context in comments.

6 Comments

I just changed my netbeans.conf to -J-Xms500M -J-XX:PermSize=1500M I'll try like this and check it out...
So, should I try another structure besides HashMap ? Cause for each line I need to check if such entry already exists or not, so with hashmap I avoid using searching algorithms...
@javardo You don't need to check if entry exists HashMap will replace entry with new value if it exists.
Yes AmitD, but I also need to sum the 'red' value of the new line to the one in the HashMap, this if there's an entry. Cause if not I just add a new entry with the 'red' value.
@javardo Something like this ? : pastebin.com/b3GB6PS0 In the end you should have a map with the green string as keys and sum of the red integers as values.
|
1

As well as changing the heap size, consider 'compressing' (encoding) the keys by storing them as packed binary, not String.

Each IP address can be stored as 4 bytes. The port numbers (if that's what they are) are 2 bytes each. The protocol can probably be stored as a byte or less.

That's 13 bytes, rather than maybe 70 stored as a UTF16 String, reducing the memory for keys by a factor of 5, if my maths is correct at this time of night...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.