Wrong output when attempting to read a text file

Question

I would like to read and print the text file to console so i did this with below code

File file = new File("G:\\text.txt");
FileReader fileReader = new FileReader(file);
int ascii = fileReader.read();

while (ascii != -1)
{
result = result + (char) ascii;
ascii = fileReader.read();
}
System.out.println(result);

although i got correct result, but in some cases i will get some strange result. Suppose my text file has this text in it:

Hello to every one

In order to have a text file I've used notepad, and when i change the encoding mode i will get strange output from my code.

Ansi : Hello to every one

Unicode : ÿþh e l l o t o e v e r y o n e

Unicode big endian: þÿ h e l l o t o e v e r y o n e

UTF-8 : ï»¿hello to every one

Why do i get these strange output? Is there any problem with my code? Or there are other reasons

Because of the encoding mode? You already mentioned that it happens when you change the encoding mode.. — almightyGOSU
– almightyGOSU, Commented Jun 23, 2015 at 6:08
@Gosu: yes as you can see, when i changed the encoding mode, i get different results — Elyas 'Eloy' Hadizadeh Tasbiti
– Elyas 'Eloy' Hadizadeh Tasbiti, Commented Jun 23, 2015 at 6:09
Use a InputStreamReader together with the correct encoding mode instead? — almightyGOSU
– almightyGOSU, Commented Jun 23, 2015 at 6:10
@ElyasHadizadeh What do you think different encodings are used for? If they all gave the same result, we'd only need a single encoding. You're also using the correct term (encoding) for the last one of your examples (UTF-8). Ansi is not an encoding, and the ones you term unicode are actually UTF-16LE and UTF-16BE. Unicode is the charset, encodings are different ways of storing the characters as bytes. — Kayaman
– Kayaman, Commented Jun 23, 2015 at 6:14
@ElyasHadizadeh This is a pretty good read: joelonsoftware.com/articles/Unicode.html — Kayaman
– Kayaman, Commented Jun 23, 2015 at 6:45

Jon Skeet · Accepted Answer · 2015-06-23 06:13:01Z

5

Your file starts with a byte-order mark (U+FEFF). It should only occur in the first character of the file - it's not terribly widely used, but various Windows tools do include it, including Notepad. You can just strip it from the start of the first line.

As an aside, I'd strongly recommend not using FileReader - it doesn't allow you to specify the encoding. I'd use Files.newBufferedReader, and either specify the encoding or let it default to UTF-8 (rather than the system default encoding which FileReader uses). When you're using BufferedReader, you can then just read a line at a time with readLine() too:

 String line;
 while ((line = reader.readLine()) != null) {
     System.out.println(line.replace("\uFEFF", ""));
 }

If you really want to read a character at a time, it's worth getting in the habit of using a StringBuilder instead of repeated string concatenation in a loop. Also note that your variable name of ascii is misleading: it's actually the UTF-16 code unit, which may or may not be an ASCII character.

The encoding you specify should match the encoding used to write the file - at that point you should see the correct output instead of an extra character between each "real" character when using Unicode and Unicode big endian.

edited Jun 23, 2015 at 6:13

answered Jun 23, 2015 at 6:08

Jon Skeet

1.5m893 gold badges9.3k silver badges9.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Elyas 'Eloy' Hadizadeh Tasbiti Over a year ago

it's seems your answer is true, could you please compose the correct way of using Files.newBufferedReader ?!

Jon Skeet Over a year ago

@ElyasHadizadeh: Well have you looked at the documentation, and tried using it yourself? It's very important to be able to do your own research.

Elyas 'Eloy' Hadizadeh Tasbiti Over a year ago

yes you are true completely, thank you for your advice and answer ;-)

Elyas 'Eloy' Hadizadeh Tasbiti Over a year ago

Jon Skeet: Thank you again very very much, i found the correct way, and actually this line of code : line.replace("\uFEFF", "") was very helpful

Collectives™ on Stack Overflow

Wrong output when attempting to read a text file

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related