0

I want to parse an almost program. The program is consisted of two lines and it is shown below:

java.io.*;
java.lang.*;

I am using a library, which reads the whole program and splits it using the command

String[] words = sourceCode.split("[\\s+|\\W+]");

What it is produced by that is the following

words[0] = "Java"
words[1] = "io"
words[2] = ""
words[3] = ""
words[4] = ""
words[5] = ""
words[6] = Java
words[7] = "lang"
words[8] = ""
words[9] = ""
words[10] = ""
words[11] = ""

However, What I want is to break that program in lines first, and after that at a line's component. That is, I am using

String[] allLines = file1String.split("[\n]");
String[][] wordsOfALine =new String[allLines.length][];
for (int i=0;i<allLines.length;i++){
       wordsOfALine[i] = allLines[i].split("[\\s+|\\W+]").clone();
}

However, what I am getting here is

wordsOfALine[0][0] = "Java"
wordsOfALine[0][1] = "io"
wordsOfALine[1][0] = "Java"
wordsOfALine[1][1] = "lang"

And therefore all the empty words have now disappeared. Do you know how I can bring them back? I need to be consistent with the library...

Thanks

2
  • I think I'd use a lexer rather than a regex for this, but that may only reflect my own biases... Commented Mar 17, 2014 at 2:15
  • You night want to read about what character classes are. Commented Mar 17, 2014 at 3:20

2 Answers 2

2

Firstly, your split regex is a giant bug. This expression:

"[\\s+|\\W+]"

means any single character that is one of:

  • whitespace
  • the plus sign +
  • the pipe char |
  • a non-word char (which includes whitespace btw)

It should be just:

"\\W"

Also if you add an extra parameter to the split call (see javadoc for why), you'll get trailing blank split terms.

This produces the output you want:

allLines[i].split("\\W", -1)
Sign up to request clarification or add additional context in comments.

Comments

0

Try the following and modify as needed.

String[] allLines = file1String.split("[\n]");
String[] wordsOfALine = {};
int k = 0;
for (int i=0 ;i<allLines.length;i++){
   String[] words= allLines[i].split("[\\r\\n]+]");
   for(int j = 0; j < lines.length; j++){
      wordsOfALine[k++] : " + words[j]);
   }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.