13

I have a comparatively long file of unsigned integers (64 bits each, 0.47GB file) that I need to read and store in an array. After some brain racking I wound up using the type long, since everything in Java is signed (correct me if I'm wrong, please) and I couldn't think of a better alternative. Anyhow, the array only has to be sorted, so the precise values of the original numbers are not of the utmost importance. We're supposed to measure the efficiency of the sorting algorithm, nothing more. However, I came up against a brick wall when I actually came to reading the file (my code below).

public class ReadFileTest {
    public static void main(String[] args) throws Exception {
        String address = "some/directory";
        File input_file = new File (address);
        FileInputStream file_in = new FileInputStream(input_file);
        DataInputStream data_in = new DataInputStream (file_in );

        long [] array_of_ints = new long [1000000];
        int index = 0;

        long start = System.currentTimeMillis();

        while(true) {
            try {
                long a = data_in.readLong();
                index++;
                System.out.println(a);
            }
            catch(EOFException eof) {
                System.out.println ("End of File");
                break;
            }
        }

        System.out.println(index);
        System.out.println(System.currentTimeMillis() - start);
    }
}

It goes on and on forever, and I usually step out to have lunch while the programme's reading. All in all 20 minutes is the fastest I've achieved so far. A course mate bragged today that his programme read it in 4 sec. He's working in C++ and I know that C++ is faster than Java, but this is ridiculous. Could somebody, please, tell me what I'm doing wrong here. I can't blame it on the language or the machine, so it must be me. From what I can see, though, the Java tutorials use exactly the same class, i.e. DataInputStream. I also saw FileChannels being recommended a couple of times. Are they the only way out?

4
  • 6
    Does your mates program also print everything to standard output? I bet most time goes there. Comment out the println in the read loop and try again. Commented Apr 8, 2011 at 19:07
  • 2
    Also make sure you're using the same setup he is. If you're using a 5400 RPM HDD and he's using an SSD he's going to smoke you no matter what language you're using. Commented Apr 8, 2011 at 19:18
  • how many times you have your lunch everyday? (j/k) Commented Apr 8, 2011 at 19:22
  • Also, for you 0.47 GB file you might want to use a longer array. You might try to use inputFile.getLength()/8 as the length of the array. Commented Apr 9, 2011 at 0:57

2 Answers 2

17

You should use buffered input, something like:

new DataInputStream(
    new BufferedInputStream(
        new FileInputStream(new File(input_file))))
Sign up to request clarification or add additional context in comments.

2 Comments

Also, try with different sizes of buffers. Don't assume that the default buffer size is the best, especially since you are reading such a large number of bytes.
In general I haven't found that increasing the buffer above the default of 8192 to help much, even in native languages. Having very small buffers of a few 10s or 100s of bytes is really slow, but once you hit 8192 you are probably getting 90% of the max performance or more.
2

Want to object of the file:

new ObjectInputStream(
    new BufferedInputStream(
        new FileInputStream(new File(file_name))))

More about difference

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.