7

I have a csv file which contains words in english followed by their Hindi translation. I am trying to read the csv file and do some further processing with it. The csv file looks like so:

English,,Hindi,,,  
,,,,,  
Cat,,बिल्ली,,,  
Rat,,चूहा,,,  
abandon,,छोड़ देना,त्याग देना,लापरवाही की स्वतन्त्रता,जाने देना  

I am trying to read the csv file line by line and display what has been written. The code snippet (Java) is as follows:

   //Step 2. Read csv file and get the string.
            FileInputStream fis = null;
            BufferedReader br = null;
            try {
                fis = new FileInputStream(new File(csvFile));
            } catch (FileNotFoundException e1) {
                // TODO Auto-generated catch block
                e1.printStackTrace();
            }

            boolean startSeen = true;
            if(fis != null) {
                try {
                    br = new BufferedReader(new InputStreamReader(fis, "UTF-8"));
                } catch (UnsupportedEncodingException e2) {
                    // TODO Auto-generated catch block
                    e2.printStackTrace();
                    System.out.print("Unsupported encoding");
                }
                String line = null;
                if(br != null) {
                    try {
                        while((line = br.readLine()) != null) {
                            if(line.contains("English") == true) {
                                startSeen = true;
                            }

                            if((startSeen == true) && (line != null)) {
                                StringBuffer sbuf = new StringBuffer();
                                //Step 3. Parse the line.
                                sbuf.append(line);
                                System.out.println(sbuf.toString());
                            }
                        }
                    } catch (IOException e1) {
                        // TODO Auto-generated catch block
                        e1.printStackTrace();
                    }
                }  
}

However, the following output is what I get:

English,,Hindi,,,
,,,,,
Cat,,??????,,,
Rat,,????,,,
abandon,,???? ????,????? ????,???????? ?? ???????????,???? ????  

My Java is not that great and though I have gone through a number of posts on SO, I need more help in figuring out the exact cause of this problem.

2
  • 1
    Just side comment: You dont have to equate boolean values like you are doing if(line.contains("English") == true) and (startSeen == true) instead you can directly use if(line.contains("English")) and (startSeen) as they could be either true or false. Commented Jan 16, 2013 at 6:33
  • @smit: point taken. Thanks! Commented Jan 16, 2013 at 7:16

3 Answers 3

5

For reading text file it is better to use character stream e.g by using java.util.Scanner directly instead of FileInputStream. About encoding you have to make sure first that the text file that you want to read is saved as 'UTF-8' and not otherwise. I also notice in my system, I have to save my java source file as 'UTF-8' as well to make it shown hindi char properly.

However I want to suggest simpler way to read csv file as follow:

Scanner scan = new Scanner(new File(csvFile));
while(scan.hasNext()){
   System.out.println(scan.nextLine());
}

see the output

Sign up to request clarification or add additional context in comments.

4 Comments

The problem was that my file had not been saved as UTF-8. When I incorporated Evgeniy's solution of pasting a println command in the editor, Eclipse gave me the option of saving content as UTF-8. In some sense, both you guys got it right. Thanks!
Same problem. My Java file had not been saved as UTF-8. +1 for Help Full Answer.
I created a new text file and wrote few Devanagari(Hindi/Marathi) words there. While saving eclipse asked me if I want to save it as UTF-8. I said yes. So I guess file is in required format. But above code does not work. It does not print anything. If I have only English characters; then only it prints. Is it specific to any Java version ?
@jon Kartago ; though you have marked it in bold; I forgot to save my Java file to use UTF-8
2

I think your console cannot show Hindi chars. Try

System.out.println("Cat,,बिल्ली,,,");

to test

1 Comment

I tried out your command in the editor and that seemed to have been the problem. On saving the file, Eclipse brought up options of saving the code in UTF-8. Now it works. Thanks!
0

So as discussed in above answers; solutions it is TWO steps 1) Save your txt file as UTF-8 2) Change the property of your Java code to use UTF-8 In Eclipse; right click on Java file; Properties -> Resurces -> Text File Encoding -> Other -> UTF-8

Refer screenshot given on http://howtodoinjava.com/2012/11/27/how-to-compile-and-run-java-program-written-in-another-language/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.