11

Part of the code I'm working on uses a bunch of regular expressions to search for some simple string patterns (e.g., patterns like "foo[0-9]{3,4} bar"). Currently, we use statically-compiled Java Patterns and then call Pattern#matcher to check whether a string has contains a match to the pattern (I don't need the match, just a boolean indicating whether there is a match). This is causing a noticeable amount of memory allocation that is affecting performance.

Is there a better option for Java regex matching that is faster or at least doesn't allocate memory every time it searches a string for a pattern?

4
  • what about download.oracle.com/javase/1.4.2/docs/api/java/lang/… this will return boolean Commented Sep 21, 2011 at 19:17
  • 3
    @c0mrade <String>.matches(<pattern>) does the same as Pattern.matches(<pattern>,<string>) which does the same thing as Pattern.compile(<pattern>).matcher(<string>).matches() Commented Sep 21, 2011 at 19:22
  • @Jared correct, but he said he was using pattern/matcher not string matches Commented Sep 21, 2011 at 19:24
  • 1
    @c0mrade the big difference is that he said he is using statically compiled Patterns, using <String>.matches will compile the same pattern every time it is called, taking more time and memory Commented Sep 21, 2011 at 19:50

4 Answers 4

15

Try matcher.reset("newinputtext") method to avoid creating new matchers each time you are calling Pattern.matcher.

Sign up to request clarification or add additional context in comments.

2 Comments

That should improve speed to some degree.. see my [admittedly weak] test here: pastie.org/2570213
This is good, but note that the Matcher class is not thread-safe. In a threaded environment, initialize a Matcher for each thread or just use a pre-compiled static Pattern (Pattern class is thread-safe but that gives you the same memory allocation issue you started with).
4

If you expect less than 50% of lines matching your regex, you can first try to test for some subsequence via String.indexOf() which is about 3 to 20 times faster for simple sequence compared to regex matcher:

if (line.indexOf("foo")>-1) && pattern.matcher(line).matches()) {
    ...

If you add to your code such heuristics, remember to always well document them, and verify using profiler that code is indeed faster compared to simple code.

2 Comments

And also add test ensuring that the optimized version does the same or the plain regex. This comes handy if someone changes the regex and forgets about the rest.
Good hint - this also works for contains, and the match doesn't have to be perfect - it just needs to be enough to reduce the number of things going to the pattern matcher by at least 50%
3

If you want to avoid creating a new Matcher for each Pattern, use the usePattern() method, like so:

Pattern[] pats = {
  Pattern.compile("123"),
  Pattern.compile("abc"),
  Pattern.compile("foo")
};
String s = "123 abc";
Matcher m = Pattern.compile("dummy").matcher(s);
for (Pattern p : pats)
{
  System.out.printf("%s : %b%n", p.pattern(), m.reset().usePattern(p).find());
}

see the demo on Ideone

You have to use matcher's reset() method too, or find() will only search from the point where the previous match ended (assuming the match was successful).

Comments

0

You could try using the Pattern.matches() static method which would just return the boolean. That wouldn't return a Matcher object so it could help with the memory allocation issues.

That being said the regex pattern would not be precompiled so it would be a performance vs resources thing at the point.

2 Comments

Pattern#matches creates a Matcher object inside that method.
@jonderry: Very good point +1. It actually creates both, Pattern by compiling a regex and creates Mather for given input.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.