3

I am trying to split a string according to a certain set of delimiters.
My delimiters are: ,"():;.!? single spaces or multiple spaces. This is the code i'm currently using,

String[] arrayOfWords= inputString.split("[\\s{2,}\\,\"\\(\\)\\:\\;\\.\\!\\?-]+");

which works fine for most cases but i'm have a problem when the the first word is surrounded by quotation marks. For example

String inputString = "\"Word\" some more text.";

Is giving me this output

arrayOfWords[0] = ""
arrayOfWords[0] = "Word"
arrayOfWords[1] = "some"
arrayOfWords[2] = "more"
arrayOfWords[3] = "text"

I want the output to give me an array with

arrayOfWords[0] = "Word"
arrayOfWords[1] = "some"
arrayOfWords[2] = "more"
arrayOfWords[3] = "text"

This code has been working fine when quotation marks are used in the middle of the sentence, I'm not sure what the trouble is when it's at the beginning.

EDIT: I just realized I have same problem when any of the delimiters are used as the first character of the string

1
  • 1
    Why don't just delete all quotes in a previous step? Commented Sep 15, 2013 at 22:35

2 Answers 2

3

Unfortunately you wont be able to remove this empty first element using only split. You should probably remove first elements from your string that match your delimiters and split after it. Also your regex seems to be incorrect because

  • by adding {2,} inside [...] you are in making { 2 , and } characters delimiters,
  • you don't need to escape rest of your delimiters (note that you don't have to escape - only because it is at end of character class [] so he cant be used as range operator).

Try maybe this way

String regexDelimiters = "[\\s,\"():;.!?\\-]+";
String inputString = "\"Word\"  some more text.";
String[] arrayOfWords = inputString.replaceAll(
        "^" + regexDelimiters,"").split(regexDelimiters);

for (String s : arrayOfWords)
    System.out.println("'" + s + "'");

output:

'Word'
'some'
'more'
'text'
Sign up to request clarification or add additional context in comments.

1 Comment

thank you for explaining the problems inside my character class. I'm new to using regex. This is just what I was looking for.
2

A delimiter is interpreted as separating the strings on either side of it, thus the empty string on its left is added to the result as well as the string to its right ("Word"). To prevent this, you should first strip any leading delimiters, as described here:

How to prevent java.lang.String.split() from creating a leading empty string?

So in short form you would have:

String delim = "[\\s,\"():;.!?\\-]+";
String[] arrayOfWords = inputString.replaceFirst("^" + delim, "").split(delim);

Edit: Looking at Pshemo's answer, I realize he is correct regarding your regex. Inside the brackets it's unnecessary to specify the number of space characters, as they will be caught be the + operator.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.