I was trying to read data (chars) from a large text file (~250MB) in 1KB chunks and was very surprised that reading that file using either FileReader or BufferedReader takes exactly the same time, even though the BufferedReader has an internal 8KB character buffer, while FileReader doesn't.

FileReader code:

File file = new File("250mbfile.txt");
FileReader fileReader = new FileReader(file);

char[] charBuffer = new char[1024];
while(fileReader.read(charBuffer, 0, 1024) != -1) {//...};

BufferedReader code:

File file = new File("250mbfile.txt");
FileReader fileReader = new FileReader(file);
BufferedReader  bufferedReader = new BufferedReader(fileReader);

char[] charBuffer = new char[1024];
while(bufferedReader.read(charBuffer, 0, 1024) != -1) {//...};

JMH benchmark:

Benchmark                           Mode  Cnt     Score     Error  Units
Benchmark.bufferedReaderCHARBUFFER  avgt    5  3878.794 ± 145.105  ms/op
Benchmark.fileReaderCHARBUFFER      avgt    5  3968.835 ± 160.128  ms/op

Why do they both take the same time to complete the task? Since BufferedReader has an 8K character buffer, as I understand, it has to invoke underlaying InputStreamReader's decoding operations once per 8K bytes (it always fills the buffer fully). FileReader has to do the same once per 1K bytes (as specified in the snippets). Therefore, FileReader should be slower as more decoding operations have to be invoked. My only guess would be that the difference in the speeds of repeatedly decoding 1K blocks of bytes and decoding 8K blocks of bytes is so extremely tiny that it's basically impossible to notice. To support this claim, I've made two additional JMH measurements:

From the test "CHARBUFFER_1K":

File file = new File("250mbfile.txt");
FileReader fileReader = new FileReader(file);

char[] charBuffer = new char[1024];
while(fileReader.read(charBuffer, 0, 1024) != -1) {//...};

From the test "CHARBUFFER_8K":

File file = new File("250mbfile.txt");
FileReader fileReader = new FileReader(file);
BufferedReader bufferedReader = new BufferedReader(fileReader);

char[] charBuffer = new char[8192];
while(bufferedReader.read(charBuffer, 0, 8192) != -1) {//...};

JMH:

Benchmark                  Mode  Cnt     Score     Error  Units
Benchmark.CHARBUFFER_8K    avgt    5  3778.331 ± 143.736  ms/op
Benchmark.CHARBUFFER_1K    avgt    5  3778.793 ± 134.118  ms/op

5 Replies 5

...as I understand, it has to invoke underlaying InputStreamReader's decoding operations once per 8K bytes

No. A decoding operation is applied to each byte, so it has little to do with the size of the buffer. The size of the buffer only affects the frequency of disk reads (assuming file storage). Your FileReader is highly likely to buffer the same amount of data at the OS IO level, which is likely to be why little difference is to be perceived from using BufferedReader

This isn't an opinion-based question. You should delete it an re-post as a normal question that can be answered properly.

(Stack Overflow is running a badly-designed experiment which misleads people into asking questions like this as opinion-based, not real questions. Opinion-based questions alpha experiment on Stack Overflow. Some of the designers of this thought that debugging questions were the only kind of normal questions previously allowed, or something like that. Many at SO are completely out of touch with the community that actually uses Stack Overflow. Also, they started this experiment without the ability to flip a question from this bad format to normal Q&A.)

It would be really interesting to see the full JMH code.

Because repeated reading should improve a little bit (OS calls by x8).

But we don't know whether you put only the reading part in a loop, or the whole "open files, allocate buffers, read" shebang. The secondary will be unable to show any differences, because a) allocating file access and buffers is so much slower than reading, so overshadowing the actual operation you're looking at, and b) The block is small enough (1KB) to be read by the OS in one sweep, also preventing differences in timing.

But one thing up front: Reading in (sufficiently large) blocks will in fact do the same job as BufferedXXX does.

On modern systems, the optimal cache size is surprisingly small (Win10: 8KB for reads, 200KB for writes), due to all the other caching and buffering going on, by the OS itself and especially on SSDs.

So the strengths of BufferedXXX really start to shine when reading single bytes, like on DataInpuStream or ObjectInputStream.

If you want to get a good estimate of how close you get your app to max possible performance, you'd have to check out the system calls made by your app. Use a tool like procmon (Windows) for example, or write a custom InpuStream wrapper class, that counts singular accesses onto the original InputStream (in your case the FileReader -> InputStreamReader -> FileInputStream).

Using those analytics, I had an app loading 1 TB data via DataInputStream (basically doing the job of an ObjectInputStream with custom deserialization), and could speed that up by a factor of ~200 just with the use of BufferedXXX and the proper buffer sizes (as above).

Now try it with a 1-byte 'buffer' for your FileReader. This should be reposted, as @PeterCordes mentioned. The things OP seems to be unaware of:

  • The primary slowdown in the chain is the disk system; most disks can only return data in chunks (exactly how large a chunk is depends on the hardware), but various implementations of InputStream will refuse to buffer a chunk because that's not their job. If you call read() (or read(arr) with a very small array, much smaller than 1kb), then the system ends up reading a chunk, and then discards all data in that chunk except the 1 byte actually needed right at that moment. Looping through an 4kb file, on a system that has 8kb chunks, thus means the disk reads 32MB (yes that's an M) in 4000 requests, wheras it could have been done reading 8kb in 1 request. BufferedReader solves that by 'adding' such buffers. If you're already calling read(arr) with a reasonable array size (and 1kb is on the low end, but close enough), then this factor disappears. And this factor is overwhelmingly the thing BufferedReader is for.

  • I'm not even sure if FileReader is one of those implementations that buffers or not. It might itself also buffer.

  • The 'efficiency' gain for allowing the charset encoder to operate on an 8kb chunk instead of an 1kb is irrelevant and immeasurable.

There is a simple answer to your question, and that answer is that buffering the reader is good for doing inefficient reading (like reading line by line) if you just want to load the whole file (or another resource like an HTML page) and you know its length (length is included in HTML header information) you can just create a buffer this size and use inputstream.read(buffer)

Readers in Java are for reading and converting text thatʼs been saved into a non-utf-8 format into chars and Strings that Java uses (and also for reading images and other binary data in different flavors)

Your Reply

By clicking “Post Your Reply”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.