2

So. I've tried:

StringBuilder sb = new StringBuilder();
for(String bufferItem: buffer){
    sb.append(bufferItem);
}

and I've also tried:

String.join("\n", buffer)

I am joining a large files(under 10GB) in memory on a system with more than 100GB. The following is the stack trace. How can I solve this problem?

Exception in thread "main" java.lang.OutOfMemoryError
    at java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:161)
    at java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:155)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:125)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
    at java.lang.StringBuilder.append(StringBuilder.java:136)
    at java.lang.StringBuilder.append(StringBuilder.java:76)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:484)
    at java.lang.StringBuilder.append(StringBuilder.java:166)
    at java.util.StringJoiner.add(StringJoiner.java:185)
    at java.lang.String.join(String.java:2504)
4
  • A 9GB file sounds rather large to me actually. How much memory is being allocated to the Java heap? You should check your JVM memory settings. Commented Oct 9, 2018 at 2:03
  • 3
    This doesn't sound like a good idea in general. What are you trying to accomplish? Commented Oct 9, 2018 at 2:05
  • Can you increase the max heap size in the JVM and try again? There's a good chance you'll need to break this process into chunks in order to remain within the allotted memory during execution. You might check out Hadoop MapReduce for this sort of memory-intensive job. Commented Oct 9, 2018 at 2:05
  • 4
    I am joining a large files - then maybe you should actually be writing to a file instead of trying to do this In memory. Commented Oct 9, 2018 at 2:10

2 Answers 2

13

You cannot create strings with that many characters. The OutOfMemoryError is not because the heap was full, but because you're trying to build a String larger than the maximum possible size.

The maximum possible size is defined as 2 to the power 31, minus 1, minus 8. That's roughly 2Gb if you only use single-byte characters in a file. See the source of AbstractStringBuilder.

/**
 * The maximum size of array to allocate (unless necessary).
 * Some VMs reserve some header words in an array.
 * Attempts to allocate larger arrays may result in
 * OutOfMemoryError: Requested array size exceeds VM limit
 */
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

You simply cannot create strings larger than that.

Why do you want to join the files in memory when you can join them while streaming them to disk?

Sign up to request clarification or add additional context in comments.

14 Comments

I had a hunch about this. I need to send these files to a HTTP service. So storing them on the disk just seemed unnecessary to me. Especially since the amount of memory I have on the box where I am running my Java job is significantly larger than the files that I need to write to the service.
I am using Httpcon.getOutputStream() to write my data to. So I am now trying writing one string at a time to the stream and flushing it. Let me see if that works.
You can also stream directly to an OutputStream or Writer that is attached to the HTTP service request (depends on the library you're using - URLConnection.getOutputStream() for example. But I doubt that there are many web services that take files that large, or it would need to have been very carefully written, because often the whole request is first read into memory on the webservice side, and there it will run into the same or a similar problem as you just have.
The service on the other side is written in Python and it can handle large file sizes. We have done something like that before.
Exception in thread "main" java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:3236) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
|
-2

First of all, it is probably not recommended to use that much memory, and I would break it up into reasonably-sized chunks (maybe join 100 strings at a time and then write to a file). If you really must use that much memory, then you will need to pass -Xmx10G to the JVM. This allocates more heap space for Java to use.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.