5

In my Java program, I have the following code:

String[] states = readFile("States.txt");

System.out.println(String.join(" ", states));
System.out.println(states.length);

Arrays.sort(states);

System.out.println(String.join(" ", states));
System.out.println(states.length);

Strangely enough, calling Arrays.sort() from java.util.Arrays causes many items to be removed from the list. When I run the code above, this is the output:

FL GA SC NC VA MD NY NJ DE PA CT RI MA VT NH ME AL TN KY WV OH MI MS AR MO KS NE IN IL WI MN LA TX OK IA SD ND NM CO WY ID AZ UT NV MT CA OR WA AL HI
50
AL AL AR AZ CA CO CT DE FL GA HI
50

I am very, very confused as to what's going on here. Why are only 11 items printed out? Is Arrays.sort() removing items? Why would Arrays.sort() do this? Why is the size of the array still 50? Are the items being blanked out or something?

I assume that my readFile() method works fine as the unsorted array prints out fine...

public static String[] readFile(String FileName) {
    char[] cbuf = new char[200];
    String[] array;
    FileReader fr;
    try {
        fr = new FileReader(FileName);
        try {
            fr.read(cbuf);
        } catch (IOException e) {
            e.printStackTrace();
        }
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }
    String all = new String(cbuf);
    array = all.split("\n");
    return array;
}

The file I am reading from: https://nofile.io/f/8TO3pdnmS3W/States.txt MD5 starts with 8b961b5

18
  • 1
    Smth is terribly wrong, if you're getting your array and it outputs all states, but after Arrays.sort() it doesnt - cant be :) why dont you debug your code before and after sort() fn? Commented Mar 26, 2018 at 4:44
  • 1
    With String[] states = "FL GA SC NC VA MD ...".split(" "); there is no problem. Check if split("\n"); apply to all the data in the file (there aren't any \r\n delimiters for example). Commented Mar 26, 2018 at 4:51
  • 1
    @AaronFranke, then your log is truncating the output! Arrays.toString() will work. Go check for truncation. Commented Mar 26, 2018 at 4:56
  • 2
    I susepect a ^Z in the data file. Commented Mar 26, 2018 at 4:59
  • 1
    Learning to use the debugger is essential, try to put this high on your todo list. It will save you hours of pain and confusion. The reason trim() works is because your string has a bunch of null character values at the end as a result of your buffer being longer than your input and trim() removes them. It would be better to use a smaller buffer, or at least check the return value of fr.read() to see how many characters you read, and use that knowledge when you convert it to a string. Commented Mar 26, 2018 at 5:49

4 Answers 4

3

The newline character at the end of the file, specifically after the last entry in the file "HI", seems to be causing the problem. It can be solved in the readFile function by using:

array = all.trim().split("\n");

Sign up to request clarification or add additional context in comments.

4 Comments

This works perfectly! But, there were no newline characters at the end of the file. The file appears to simply end after HI , but I suppose there were some kind of ghost characters hiding at the end of the file. Maybe it's my text editor's fault. I can upload the file if y'all want to investigate further.
Please do upload the file. It may help some of us to pin-point the problem. Also I emulated your input by inserting a newline (newline = System.getProperty("line.separator"); ) at the end of the space-separated input you provided. Observed similar behaviour. Although I should mention that although the output breaks after HI, the rest of the output printed in a new line.
I uploaded the file, there's a link at the bottom of my question. I also posted the first part of the MD5 sum to check if you have the exact same file. Furthermore, my IDE that I am using to show output is Eclipse 4.6.3.
It may have been an issue with the fact that my buffer was larger than the input, and that is the reason why it needs trimming. See @Matt 's answer
1

Confirmed the 'artifact' behavior via the Online Java Compiler :

import java.util.Arrays;

public class MyClass {
    public static void main(String args[]) {
        // instead of using readFile() the array is defined here.
        // note the \n on the last element
        String[] states = {"FL", "GA", "SC", "NC", "VA", "MD", "NY", "NJ", "DE", "PA", "CT", "RI", "MA", "VT", "NH", "ME", "AL", "TN", "KY", "WV", "OH", "MI", "MS", "AR", "MO", "KS", "NE", "IN", "IL", "WI", "MN", "LA", "TX", "OK", "IA", "SD", "ND", "NM", "CO", "WY", "ID", "AZ", "UT", "NV", "MT", "CA", "OR",
           "WA", "AL", "HI\n"};

        System.out.println(String.join(" ", states));
        System.out.println(states.length);

        Arrays.sort(states);

        System.out.println(String.join(" ", states));
        System.out.println(states.length);
    }
}

And the Output:

FL GA SC NC VA MD NY NJ DE PA CT RI MA VT NH ME AL TN KY WV OH MI MS AR MO KS NE IN IL WI MN LA TX OK IA SD ND NM CO WY ID AZ UT NV MT CA OR WA AL HI

50
AL AL AR AZ CA CO CT DE FL GA HI
 IA ID IL IN KS KY LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK OR PA RI SC SD TN TX UT VA VT WA WI WV WY
50

Apparently the log used by @Arjun Kay had truncated the elements printed after the sorted element with the break-line character.

1 Comment

Actually, the issue was that after HI there were many null characters in the char[] buffer, which caused the text to stop printing.
1

Your readFile method is sloppy. You declare a buffer array char[] cbuf = new char[200]; with 200 elements.

It sounds like your file is formatted with a state on each line:

FL
GA
SC
NC

You read the entire file into your buffer, but you don't fill the buffer so the trailing 50 elements are still initialised to the default null character value \u0000 (see this question)

cbuf = [F][L][\n][G][A][\n][S][C][\n][N][C][\n] ... [\u0000][\u0000]

Then you convert cbuff to a string:

all = "FL\nGA\nSC\nNC\n ... \u0000\u0000\u0000"

Then you split the string to convert it to an array:

array = [FL][GA][SC][NC]...[\u0000\u0000\u0000\u0000\u0000]

So you can see you have a bunch of useless characters in your final array, because your buffer was bigger than the file that you read.

I can't replicate your missing states on my machine, but you could clean up your file reader and I think it will work for you. Use a BufferedReader, then you can read your file a line at a time and it will save you all the manual splitting. I would also recommend using a List<String> instead of a String[] array, so you don't have to handle the size of the array.

Consider this:

public static void main(String[] args) throws IOException {
    String[] states = readFile("States.txt");

    System.out.println(String.join(" ", states));
    System.out.println(states.length);

    Arrays.sort(states);

    System.out.println(String.join(" ", states));
    System.out.println(states.length);

    // now do the same thing but using a list
    List<String> statesList = readFileToList("States.txt");

    System.out.println(String.join(" ", statesList));
    System.out.println(statesList.size());

    Collections.sort(statesList);

    System.out.println(String.join(" ", statesList));
    System.out.println(statesList.size());
}

// read the file to an array
public static String[] readFile(String FileName) throws IOException {
    String[] states = new String[50];
    BufferedReader br = new BufferedReader(new FileReader(FileName));

    String state;
    int index = 0; // keep track of the array index
    // when readLine() returns null there are no more lines to read
    while((state = br.readLine()) != null && index < 50) {
        states[index] = state;
        index++;
    }

    return states;
}

// read the file to a list
public static List<String> readFileToList(String FileName) throws IOException {
    List<String> states = new ArrayList<>(); // no array size to worry about
    BufferedReader br = new BufferedReader(new FileReader(FileName));

    String state;
    while((state = br.readLine()) != null) {
        states.add(state); // no indexes to worry about
    }

    return states;
}

4 Comments

Normally I'd agree about Lists, but for this specific program I was required to use a String array. And if you would like to test with the original text file I've added it to my question.
If you have to use an array that's fair. I tested with the file you uploaded and my code works fine. I'm pretty sure your problem was related to your buffer being bigger than your input.
Just tested it. With a buffer of 150, the bug occurs, but with 149, the bug no longer occurs. I had no idea having a larger buffer than necessary could cause issues. Still, .trim() is an easier solution in my case.
@AaronFranke Even if you need an array, the convenience of using Files#readAllLines and doing a toArray on the result is a strong argument here (on top of the argument that the cases where you really need to read raw char[] arrays are rare...)
0

you can use BufferedReader for reading a line from the FileReader like this

 public static String[] readFile(String FileName) {
        ArrayList<String> stringArrayList = new ArrayList<>();
        BufferedReader bufferedReader;

        try {
            bufferedReader = new BufferedReader(new FileReader(FileName));
            String line;
            while ((line = bufferedReader.readLine()) != null) {
                stringArrayList.add(line);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        return stringArrayList.toArray(new String[0]);
    }

1 Comment

Covered with Files#readAllLines, basically...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.