0

I have a big file of size 10gb, If i read its whole contents using readfully() in java, I get a outofmemoryerror, so i decided to read the big 10gb file in parts using same readfully(), for this i need to pass the offset and length parameters for readfully(). The offset must be of long or double datatype so that it can point to different parts of the file. But the readfully() accepts only int offset. How to read the big data?

try {
    IOUtils.readFully(in, contents, minOffset, maxOffset);
    value.set(contents, 0, contents.length);
} finally {
    IOUtils.closeStream(in);
}

Can I use seek() to get to a specific position and then use readfully() from that position?

3
  • 1
    "Can I use seek() to get to a specific position and then use readfully() from that position?" Why don't you try it? And why use readFully instead of proper streaming given that you do not want to read the whole file at once? Commented Jun 13, 2014 at 17:47
  • I bet you don't get any OutOfMemoryError from the code you're showing - there is no memory allocation anywhere in it. A good start would be to just read the javadocs to understand what readFully does. Commented Jun 13, 2014 at 18:16
  • the bytes array which i am passing to the readfully i.e. contents has been dynamically allocated memory based on the length of the file. Commented Jun 14, 2014 at 3:23

1 Answer 1

1

Use the class java.util.Scanner to run through the contents of the file and retrieve lines serially, one by one:

FileInputStream inputStream = null;
Scanner sc = null;
try {
    inputStream = new FileInputStream(path);
    sc = new Scanner(inputStream, "UTF-8");
    while (sc.hasNextLine()) {
        String line = sc.nextLine();
        // System.out.println(line);
    }
    // note that Scanner suppresses exceptions
    if (sc.ioException() != null) {
        throw sc.ioException();
    }
}
finally {
    if (inputStream != null) {
        inputStream.close();
    }
    if (sc != null) {
        sc.close();
    }
}

This solution will iterate through all the lines in the file – allowing for processing of each line – without keeping references to them – and in conclusion, without keeping them in memory. For more details see this.

Sign up to request clarification or add additional context in comments.

2 Comments

but it would be a slow process. If I want to read big files then what should I do? or will it work fast for big files?
I guess reading a big file would anyway be slow. You might want to adjust the buffer size to achieve a trade-off between fast reading time and memory resource usage.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.