2

I am trying to split a string that contains whitespaces and special characters. The string starts with special characters. When I run the code, the first array element is an empty string.

String s = ",hm  ..To?day,.. is not T,uesday.";
String[] sArr = s.split("[^a-zA-Z]+\\s*");

Expected result is ["hm", "To", "day", "is", "not", "T", "uesday"]

Can someone explain how this is happening?

Actual result is ["", "hm", "To", "day", "is", "not", "T", "uesday"]

2
  • 2
    There is a leading , in your input. The string is split on it the first time into "" and the rest. Commented Jul 9, 2019 at 19:52
  • How can I update my regex to prevent the empty element being added? My goal is to add all words to array excluding any special characters and whitespaces. Commented Jul 9, 2019 at 20:01

2 Answers 2

2

Split is behaving as expected by splitting off a zero-length string at the start before the first comma.

To fix, first remove all splitting chars from the start:

String[] sArr = s.replaceAll("^([^a-zA-Z]*\\s*)*", "").split("[^a-zA-Z]+\\s*");

Note that I’ve altered the removal regex to trim any sequence of spaces and non-letters from the front.

You don’t need to remove from the tail because split discards empty trailing elements from the result.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. I accepted this answer because it shows me how to trim the beginning spaces/non-letters occurring 0 or more times.
1

I would simplify it by making it a two-step process rather than trying to achieve a pure regex split() operation:

s.replaceAll( '[^a-zA-Z]+', ' ' ).trim().split( ' ' )

3 Comments

Thanks for your response. I am still wondering why my regex didn't work and ended up adding the empty string. Can you explain?
Your regex is working fine. Because the first character of your string matches your regex, it is returning a the empty string as the first element. You can think of it like this: there is a 0-length string before the first , in your subject string.
@coffeefirst The first comma was matched by the regex provided to split() is the simplest explanation I can provide.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.