0

I have a file which is split in two parts by "\n\n" - first part is not too long String and second is byte array, which can be quite long.

I am trying to read the file as follows:

    byte[] result;
    try (final FileInputStream fis = new FileInputStream(file)) {

        final InputStreamReader isr = new InputStreamReader(fis);
        final BufferedReader reader = new BufferedReader(isr);

        String line;
        // reading until \n\n
        while (!(line = reader.readLine()).trim().isEmpty()){
            // processing the line
        }

        // copying the rest of the byte array
        result = IOUtils.toByteArray(reader);
        reader.close();
    }

Even though the resulting array is the size it should be, its contents are broken. If I try to use toByteArray directly on fis or isr, the contents of result are empty.

How can I read the rest of the file correctly and efficiently?

Thanks!

3 Answers 3

1

The reason your contents are broken is because the IOUtils.toByteArray(...) function reads your data as a string in the default character encoding, i.e. it converts the 8-bit binary values into text characters using whatever logic your default encoding prescribes. This usually leads to many of the binary values getting corrupted.

Depending on how exactly the charset is implemented, there is a slight chance that this might work:

result = IOUtils.toByteArray(reader, "ISO-8859-1");

ISO-8859-1 uses only a single byte per character. Not all character values are defined, but many implementations will pass them anyways. Maybe you're lucky with it.

But a much cleaner solution would be to instead read the String in the beginning as binary data first and then converting it to text via new String(bytes) rather than reading the binary data at the end as a String and then converting it back.

This might mean, though, that you need to implement your own version of a BufferedReader for performance purposes.

You can find the source code of the standard BufferedReader via the obvious Google search, which will (for example) lead you here:

http://www.docjar.com/html/api/java/io/BufferedReader.java.html

It's a bit long, but conceptually not too difficult to understand, so hopefully it will be useful as a reference.

Sign up to request clarification or add additional context in comments.

1 Comment

This is exaclty what I found out myself just a few minutes ago :-)
1

Alternatively, you could read the file into byte array, find \n\n position and split the array into the line and bytes

    byte[] a = Files.readAllBytes(Paths.get("file"));
    String line = "";
    byte[] result = a;
    for (int i = 0; i < a.length - 1; i++) {
        if (a[i] == '\n' && a[i + 1] == '\n') {
            line = new String(a, 0, i);
            int len = a.length - i - 1;
            result = new byte[len];
            System.arraycopy(a, i + 1, result, 0, len);
            break;
        }
    }

1 Comment

I think the array copy would be quite expensive.
0

Thanks for all the comments - the final implementation was done in this way:

    try (final FileInputStream fis = new FileInputStream(file)) {

        ByteBuffer buffer = ByteBuffer.allocate(64);

        boolean wasLast = false;
        String headerValue = null, headerKey = null;
        byte[] result = null;

        while (true) {
            byte current = (byte) fis.read();
            if (current == '\n') {
                if (wasLast) {
                    // this is \n\n
                    break;
                } else {
                    // just a new line in header
                    wasLast = true;
                    headerValue = new String(buffer.array(), 0, buffer.position()));
                    buffer.clear();
                }
            } else if (current == '\t') {
                // headerKey\theaderValue\n
                headerKey = new String(buffer.array(), 0, buffer.position());
                buffer.clear();
            } else {
                buffer.put(current);
                wasLast = false;
            }
        }
        // reading the rest
        result = IOUtils.toByteArray(fis);
    }

1 Comment

Should you also put a wasLast = false; inside the if (current == '\t') block, just in case you come across an empty key value pair that results in ...\n\t\n...? :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.