1

I have got a text file containing reference, name, address, amount, dateTo, dateFrom and mandatory columns, in the following format:

"120030125 J Blog  23, SOME HOUSE,                 259.44  21-OCT-2013  17-NOV-2013"
"                  SQUARE, STREET, LEICESTER,"
                   LE1 2BB

"120030318 R Mxx   37, WOOD CLOSE, BIRMINGHAM,     121.96  16-OCT-2013  17-NOV-2013  Y"                      
"                  STREET, NN18 8DF"

"120012174 JE xx   25, SOME HOUSE, QUEENS          259.44  21-OCT-2013  17-NOV-2013"
"                  SQUARE, STREET, LEICESTER,"
                   LE1 2BB

"100154992 DL x    23, SOME HOUSE, QUEENS          270.44  21-OCT-2013  17-NOV-2013  Y"             
"                  SQUARE, STREET, LEICESTER,"
                   LE1 2BC

I am only interested in the first lines of each string and want to extract the data in the reference, name, amount, dateTo and dateFrom columns and want to write them into a CSV file. Currently I've only been able to write the following code and extract the first lines and get rid of the starting and ending double quotes. The input file contains white spaces and so does the output file.

public class ReadTxt {
    public static void main(String[] args) throws IOException {
        BufferedReader br = new BufferedReader(new FileReader("C:/Users/me/Desktop/input.txt"));
        String pattern = "\"\\d\\d\\d\\d";

        // Create a Pattern object
        Pattern r = Pattern.compile(pattern);
        int i;
        ArrayList<String> list = new ArrayList<String>();

        boolean a = true;
        PrintWriter out = new PrintWriter(new PrintWriter("C:/Users/me/Desktop/Output.txt"), a);

        try {
            String line = br.readLine();

            while (line != null) {
                Matcher m = r.matcher(line);

                if (m.find()) {
                    String temp;
                    temp = line.substring(1, line.length() - 1);
                    list.add(temp);
                }
                else {
                // do nothing
                }

                line = br.readLine();
            }
        }
        finally {
            br.close();
        }

        for (i = 0; i < list.size(); i++) {
            out.println(list.get(i));
        }

        out.flush();
        out.close();
    }
}

The above code will create a text file with the following output:

120030125  J Blog   23, SOME HOUSE, QUEENS       259.44  21-OCT-2013  17-NOV-2013
120030318  R Mxx    37, WOOD CLOSE, BIRMINGHAM,  121.96  16-OCT-2013  17-NOV-2013  Y                      
120012174  JE xx    25, SOME HOUSE, QUEENS       259.44  21-OCT-2013  17-NOV-2013
100154992  DL x     23, SOME HOUSE, QUEENS       259.44  21-OCT-2013  17-NOV-2013  Y

My expected output is as following, but into a csv file:

120030125  J Blog  259.44  21-OCT-2013  17-NOV-2013
120030318  R Mxx   121.96  16-OCT-2013  17-NOV-2013                        
120012174  JE xx   259.44  21-OCT-2013  17-NOV-2013
100154992  DL x    259.44  21-OCT-2013  17-NOV-2013  

Any suggestions, links to tutorials or help would be greatly appreciated as I am not an expert in Java. I did try looking up for tutorials on the internet, but could not find any which was useful in my case.

8
  • You showed us your actual output, can you show us your expected output too? Commented Dec 24, 2013 at 1:02
  • you want to a tutorial that will help you understanding how to write this data into a csv file? Commented Dec 24, 2013 at 1:08
  • @tieTYT - I have edited my post to show the expected output. Commented Dec 24, 2013 at 9:10
  • are the lenghts of each data fixed ? Commented Dec 24, 2013 at 9:23
  • @Adarsh - Other than the reference number in the first column and the dates columns, the length of data is not fixed. I want to be able to extract the required data and comma separate them and write them into the CSV file. As the length of the data are not fixed and the input file contains white spaces, I am not sure how to extract the required data. Commented Dec 24, 2013 at 9:36

2 Answers 2

1

Here, test this out. I just used a array, but you can implement the necessary code into yours. I changed some addresses (look at 2nd and 3rd address in the array) to have spaces and no spaces in different locations to test.

public class SplitData {

    public static void main(String[] args) {
        String[] array = {"120030125  J Blog   23, SOME HOUSE, QUEENS       259.44  21-OCT-2013  17-NOV-2013",
            "120030318  R Mxx    37,WOODCLOSE,BIRMINGHAM,  121.96  16-OCT-2013  17-NOV-2013  Y 0",
            "120012174  JE xx    25, SOME HOUSE,QUEENS       259.44  21-OCT-2013  17-NOV-2013",
            "100154992  DL x     23, SOME HOUSE, QUEENS       259.44  21-OCT-2013  17-NOV-2013  Y"  
        };

        String s1 = null;
        String s2 = null;
        String s3 = null;
        String s4 = null;
        String s5 = null;
        for (String s : array) {
            String[] split = s.split("\\s+");
            s1 = split[0];
            s2 = split[1] + " " + split[2];
            for (String string: split) {
                if (string.matches("\\d+\\.\\d{2}")) {
                    s3 = string;
                    break;
                }
            }
            String[] newArray = s.substring(s.indexOf(s3)).split("\\s+");
            s4 = newArray[1];
            s5 = newArray[2];

            System.out.printf("%s\t%s\t%s\t%s\t%s\n", s1, s2, s3, s4, s5);
        }
    }  
}

Output

120030125   J Blog  259.44  21-OCT-2013 17-NOV-2013
120030318   R Mxx   121.96  16-OCT-2013 17-NOV-2013
120012174   JE xx   259.44  21-OCT-2013 17-NOV-2013
100154992   DL x    259.44  21-OCT-2013 17-NOV-2013
Sign up to request clarification or add additional context in comments.

7 Comments

The address may or may not have spaces. Wouldn't that cause problems.
It doesn't matter the address format. There could be no address. This code ignores everything in the address location. It creates a substring from the index of the number. and uses that
Test it out with different Strings in the address part. This code is runnable, so you can play around with it.
@peeskillet - Thanks for the code. It works like a charm. I have comma separated the output like so: 120030318,R Mxx,121.96,16-OCT-2013,17-NOV-2013. Would you be able to point me in the direction for outputting the data on a CSV file. Thanks again.
Where the System.out.printf() is, Just replace that with your println to file statement
|
1
public static void main (String[] args) throws IOException {
  BufferedReader br = new BufferedReader (new FileReader ("D:/input.txt"));
  String pattern = "\"\\d\\d\\d\\d";

  // Create a Pattern object
  Pattern r = Pattern.compile (pattern);
  int i;
  ArrayList<String> list = new ArrayList<String> ();

  boolean a = true;
  PrintWriter out = new PrintWriter (new PrintWriter ("D:/Output.csv"), a);

  try {
      String line = br.readLine ();
      line= line.trim ();
      while (line != null) {
      Matcher m = r.matcher (line);
      if (m.find ()) {
          String temp;
          temp = line.substring (0, 19) + " "
                + line.substring (51, line.length () - 1);          
          temp = temp.replaceAll ("[ ]+", " ").replace ("\"", "");
          String[] array = temp.split ("[ ]");
          temp = array[0] +","+ array[1] +" "+ array[2]+","+ array[3]+","+ array[4]+","+ array[5];
          list.add (temp);
      } else {
          // do nothing
      }

      line = br.readLine ();
      }
  }   finally {
      br.close ();
  }

  for (i = 0; i < list.size (); i++) {
      out.println (list.get (i));
  }

  out.flush ();
  out.close ();
  }

OUTPUT

120030125,J Blog,259.44,21-OCT-2013,17-NOV-2013
120030318,R Mxx,121.96,16-OCT-2013,17-NOV-2013
120012174,JE xx,259.44,21-OCT-2013,17-NOV-2013
100154992,DL x,270.44,21-OCT-2013,17-NOV-2013

6 Comments

As stated by Luis "Other than the reference number in the first column and the dates columns, the length of data is not fixed". So I don't think using a fixed index (51, line.length () - 1); would work. What I did was use the index of the String that matches the number (259.44)
I had cleared that out with him in the comments section. He says that each element would always start at a fixed index. So the address part would always start at index 19 and the amount would always be at index 51.
Oh ok, I didn't catch that. +1
You still may want to format the output per the OP desired output. Also You may want to lave out the Y :)
I was going for the final CSV ouput. Thanks for pointing that Y out. I did not notice that it was not required :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.