0

I'm new to Java and I try to understand byte streams and character streams and I see that many people say that byte stream is suitable only for ASCII character set, and character stream can support all types of character sets ASCII, Unicode, etc. And I think there is a misunderstanding because I can use byte strem to read and write an Unicode character.

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;

public class DemoApp {

    public static void main(String args[]) {

        FileInputStream fis = null;
        FileOutputStream fos = null;

        try {

            fis = new FileInputStream("abc.txt");
            fos = new FileOutputStream("def.txt");
            int k;

            while ((k = fis.read()) != -1) {

                fos.write(k);
                System.out.print((char) k);
            }
        }

        catch (FileNotFoundException fnfe) {

            System.out.printf("ERROR: %s", fnfe);
        }

        catch (IOException ioe) {

            System.out.printf("ERROR: %s", ioe);
        }

        finally {

            try {

                if (fos != null)
                    fos.close();
            }

            catch (IOException ioe) {

                System.out.printf("ERROR: %s", ioe);
            }

            try {

                if (fis != null) 
                    fis.close();
            }

            catch (IOException ioe) {

                System.out.printf("ERROR: %s", ioe);
            }

        }

    }

}

The abc.txt file contains the Unicode character Ǽ and I saved the file using UTF-8 encoding. And the code is working very good, it create a new file def.txt and this file contains the Unicode character Ǽ.

And I have 2 questions:

  1. What is the truth about byte stream regarding Unicode character? Does byte stream support Unicode character or not?

  2. When I try to print with s.o.p((char) k) the result is not an Unicode character, it is just ASCII character: Ǽ. And I don't understand why the result is not an Unicode character because I know that Java and char data type support Unicode character. I tried to save this code as UTF-8 but the problem persists.

Sorry for my english grammar and thank you in advance!

4
  • 3
    Java does support unicode, but your console might not. Commented Apr 6, 2018 at 16:30
  • 4
    byte stream is suitable only for ASCII character set. No. Byte streams allow reading bytes. Not characters. To read characters, whatever the encoding of these characters is (ASCII or anything else), you use a Reader, and you specify the encoding. Commented Apr 6, 2018 at 16:34
  • 1
    When you write (char) k you are assuming each byte represents a character. But in UTF-8, all non-ASCII characters are represented using multiple bytes. It is not correct to assume one byte is one character. Create an InputStreamReader to handle this. Commented Apr 6, 2018 at 17:02
  • 3
    You can use byte streams to read and write anything. You’ve just implementing a very inefficient file copying routine. Since it reproduces the file exactly, it doesn’t matter which encoding it has (if it is a text file at all). The problems start when you cluelessly try to interpret the bytes as characters. That’s the place where you should start learning about Reader Commented Apr 6, 2018 at 17:26

1 Answer 1

3

What is the truth about byte stream regarding Unicode character? Does byte stream support Unicode character or not?

In fact, there is no such thing as a "Unicode character". There are three distinct concepts that you should NOT mix up.

  • Unicode code points
  • Characters in some encoding of a sequence of code points.
  • The Java char type, which is neither of the above. Strictly speaking.

You need to do some serious background reading on this:

Having cleared that up, we can say that while a byte stream can be used to read an encoding of a sequence of Unicode code points, the stream API design is NOT designed for the purpose of reading and writing character based text of any form. It is designed for reading and writing sequences of bytes (8 bit binary values) ... which may represent anything. The Stream API is designed to be agnostic of what the bytes represent: it doesn't know, and doesn't care!

When I try to print with s.o.p((char) k) the result is not an Unicode character, it is just ASCII character: Ǽ. And I don't understand why the result is not an Unicode character because I know that Java and char data type support Unicode character. I tried to save this code as UTF-8 but the problem persists.

(Correction. Those are NOT ASCII characters, they are LATIN-1 characters!)

The problem is not in Java. The problem is that a console is configured to expect text to be sent to it with a specific character encoding, but you are sending it characters with a different encoding.

When you read an write characters using a stream, the stream doesn't know and doesn't care about the encoding. So, if you read a file that is valid UTF-8 encoded text and use a stream to write it to a console that expects (for example) LATIN-1, then the result is typically garbage.

Another way to get garbage (which is what is happening here) is to read an encoded file as a sequence of bytes, and then cast the bytes to characters and print the characters. That is the wrong thing to do. If you want the characters to come out correctly, you need to decode the bytes into a sequence of characters and then print the characters. Casting is not decoding.

If you were reading the bytes via a Reader, the decoding would happen automatically, and you would not get that kind of mangling. (You might possibly get another kind ... if the console was not capable of displaying the characters, or if you configured the Reader stack to decode with the wrong charset.)


In summary: If you are trying to make a literal copy of a file (for example), use a byte stream. If you are trying to process the file as text, use a character stream.

The problem with your example code is that you appear to be trying to do both things at the same time with one pass through the file; i.e. make a literal copy of the file AND display it as text on the console. That is technically possible ... but difficult. My advice: don't try to do both things at the same time.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you for you reply. Is this an efficient way to copy the file or should I do something else? FileReader fr = new FileReader("abc.txt"); BufferedReader br = new BufferedReader(fr); FileWriter fw = new FileWriter("def.txt"); BufferedWriter bw = new BufferedWriter(fw); String temp; while ((temp = br.readLine()) != null) { bw.write(temp); bw.newLine(); System.out.println(temp); }
That is a efficient way. But if you are just trying to copy a file without doing any transcoding, you should use byte streams rather than readers
Can I ask you why should I use byte streams rather than readers? I know that if I work with text files it is recommended to use character streams rather than byte streams.
It depends on what you are trying to do. If you are trying to make a literal copy of a file, use a byte stream. If you are trying to process the file as text, use a character stream.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.