1

I know, I know, now I have two problems 'n all that, but regex here means I don't have to write two complicated loops. Instead, I have a regex that only I understand, and I'll be employed for yonks.

I have a string, say stack.overflow.questions[0].answer[1].postDate, and I need to get the [0] and the [1], preferably in an array. "Easy!" my neurons exclaimed, just use regex and the split method on your input string; so I came up with this:

String[] tokens = input.split("[^\\[\\d\\]]");

which produced the following:

[, , , , , , , , , , , , , , , , [0], , , , , , , [1]]

Oh dear. So, I thought, "what would replaceAll do in this instance?":

String onlyArrayIndexes = input.replaceAll("[^\\[\\d\\]]", "");

which produced:

[0][1]

Hmm. Why so? I'm looking for a two-element string array that contains "[0]" as the first element and "[1]" as the second. Why does split not work here, when the Javadocs declare they both use the Pattern class as per the Javadoc?

To summarise, I have two questions: why does the split() call produce that large array with seemingly random space characters and am I right in thinking the replaceAll works because the regex replaces all characters not matching "[", a number and "]"? What am I missing that means I expect them to produce similar output (OK that's three, and please don't answer "a clue?" to this one!).

4 Answers 4

4

well from what I can see the split does work, it gives you an array that holds the string split for each match that is not a set of brackets with a digit in the middle.

as for the replaceAll I think your assumption is right. it removes everything (replace the match with "") that is not what you want.

From the API documentation:

Splits this string around matches of the given regular expression.

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.

The string "boo:and:foo", for example, yields the following results with these expressions:

Regex     Result
:     { "boo", "and", "foo" }
o     { "b", "", ":and:f" }
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, it was the fact that split gives me an element in the array for each match of my regex; this is what I was failing to understand!
2

This is not a direct answer to your question, however I want to show you a great API that will suit your need.

Check out Splitter from Google Guava.

So for your example, you would use it like this:

Iterable<String> tokens = Splitter.onPattern("[^\\[\\d\\]]").omitEmptyStrings().trimResults().split(input);

//Now you get back an Iterable which you can iterate over. Much better than an Array.
for(String s : tokens) {
   System.out.println(s);
}

This prints:
0
1

1 Comment

A great suggestion, thanks. Right now I only have use for regex in this particular instance, but I'll go to Guava should I need it further.
2

split splits on boundaries defined by the regex you provide, so it's no great surprise you're getting lots of entries — nearly all of the characters in the string match your regex and so, by definition, are boundaries on which a split should occur.

replaceAll replaces matches for your regex with the replacement you give it, which in your case is a blank string.

If you're trying to grab the 0 and the 1, it's a trivial loop:

String text = "stack.overflow.questions[0].answer[1].postDate";
Pattern pat = Pattern.compile("\\[(\\d+)\\]");
Matcher m = pat.matcher(text);
List<String> results = new ArrayList<String>();
while (m.find()) {
    results.add(m.group(1)); // Or just .group() if you want the [] as well
}
String[] tokens = results.toArray(new String[0]);

Or if it's always exactly two of them:

String text = "stack.overflow.questions[0].answer[1].postDate";
Pattern pat = Pattern.compile(".*\\[(\\d+)\\].*\\[(\\d+)\\].*");
Matcher m = pat.matcher(text);
m.find();
String[] tokens = new String[2];
tokens[0] = m.group(1);
tokens[1] = m.group(2);

Comments

1

The problem is that split is the wrong operation here.

In ruby, I'd tell you to string.scan(/\[\d+\]/), which would give you the array ["[0]","[1]"]

Java doesn't have a single-method equivalent, but we can write a scan method as follows:

public List<String> scan(String string, String regex){
   List<String> list = new ArrayList<String>();
   Pattern pattern = Pattern.compile(regex);
   Matcher matcher = pattern.matcher(string);
   while(matcher.find()) {
      list.add(matcher.group());
   }
   return retval;
}

and we can call it as scan(string,"\\[\\d+\\]")

The equivalent Scala code is:

"""\[\d+\]""".r findAllIn string

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.