Given a set of java regular expression patterns separated by an OR (i.e | ), is there any specific precedence that the patterns will follow.
Example code:-
List<String> columnValues = new ArrayList<String>
String []columnPatterns = new String[] { "(\\S\\s?)+", "(\\S\\s?)+",
"(\\d+,?)+\\.\\d+ | \\d+:\\d+", "(\\S\\s?)+",
"-?\\$?(\\d+,?)+\\.\\d+" };
String searchString = "Text1 This is Text 2 129.80";
int findFrom = 0;
int columnIndex = 0;
List<String> columnValues = new ArrayList<String>();
for (String pattern : columnPatterns) {
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(searchString);
if (m.find(findFrom)) {
columnValues.add(columnIndex++,
searchString.substring(m.start(), m.end()).trim());
findFrom = m.end();
}
}
for (String value : columnValues) {
System.out.println("<" + value + ">");
}
The above code yields the following result:-
<Text1>
<This is Text 2>
<129.80>
But if I change the pattern at index position 2 in the columnPatterns array from "(\d+,?)+\.\d+ | \d+:\d+" to "(\d+,?)+\.\d+ | \d+:\d+ | \d+" as shown below:-
columnPatterns = new String[] { "(\\S\\s?)+", "(\\S\\s?)+",
"(\\d+,?)+\\.\\d+ | \\d+:\\d+ | \\d+", "(\\S\\s?)+",
"-?\\$?(\\d+,?)+\\.\\d+" };
I get the following result:-
<Text1>
<This is Text 2>
<129>
<.80>
Does this mean there is some kind of implicit precedence getting applied or is there some other reason behind this and what could be a solution/work around for this behaviour?
Edit: Also, why does the code behave the way it does.