5

I have a set of regex replacements that are needed to be applied to a set of String,

For example:

  1. all multiple spaces with single space ("\s{2,}" --> " ")
  2. all . followed by a char with . followed by space followed by the char (\.([a-zA-Z]-->". $1")

So I will have something like this:

String s="hello     .how are you?";
s=s.replaceAll("\\s{2,}"," ");
s=s.replaceAll("\\.([a-zA-Z])",". $1");
....

it works , however imagine I'm trying to replace 100+ such expressions on a long String. needless to say how slow this can be.

so my question is if there is a more efficient way to generalize these replacements with a single replaceAll (or something similar e.g. Pattern/Matcher)

I have followed Java Replacing multiple different...,

but the problem is that my regex(s) are not simple Strings.

7
  • You can use a single big regex and Matcher.appendReplacement. You'll have to be very careful with your regex however - as it maybe get somewhat messy and possibly suffer from catastrophic backtracking. Commented Dec 9, 2015 at 14:08
  • @BoristheSpider if I use this then I have the problem of knowing which regex is been used. Commented Dec 9, 2015 at 14:10
  • Nope, simply use capturing groups and check which one has data in it. Commented Dec 9, 2015 at 14:10
  • @BoristheSpider let's say I matched .A how would I know if this was matched using \\.([a-zA-Z]) Commented Dec 9, 2015 at 14:13
  • If you have a pattern, for example (A)|(B) then you know, when you get a match, either group 1 or group 2 will be filled - the other will be empty (barring this bug). You can use that to determine the replacement. Commented Dec 9, 2015 at 14:18

2 Answers 2

4

You have these 2 replaceAll calls:

s = s.replaceAll("\\s{2,}"," ");
s = s.replaceAll("\\.([a-zA-Z])",". $1");

You can combine them into a single replaceAll like this:

s = s.replaceAll("\\s{2,}|(\\.)(?=[a-zA-Z])", "$1 ");

RegEx Demo

Sign up to request clarification or add additional context in comments.

5 Comments

This is a great observation mate, but unfortunately there are many other rules that can't fit into such a technique
I posted answer based on the code you have in question. If you show more code then I can better judge what can be done to optimize it.
thx mate, there are over 100 rules, obviously pointless to add them all, there is this one as well ([a-zA-Z])\\-,([a-zA-Z]) --> $1-$2
@nafas This is a basis of a good solution, even if your expressions are "complex". If you can group all your regexes based on common replacement expressions, then chain calls to replaceAll() using regex alternation (as in this example), it will be as efficient as you can get it. eg s = s.replaceAll("\\s{2,}|(\\.)(?=[a-zA-Z])", "$1 ").replaceAll("foo|bar|baz", "qux").replaceAll...;
@Bohemian I agree, to be honest it took me by surprise when anubhava managed to combine them two. this way I can reduce the number of regexs i'm using but still have to have many repalceAll and etc..., I guess there is no hope for a single liner or anything more sophisticated.
1

Look at Replace multiple substrings at Once and modify it.

Use a Map<Integer, Function<Matcher, String>>.

  • group numbers as Integer keys
  • Lambdas as values

Modify the loop to check which group was matched. Then use that group number for getting the replacement lambda.

Pseudo code

Map<Integer, Function<Matcher, String>> replacements = new HashMap<>() {{
    put(1, matcher -> "");
    put(2, matcher -> " " + matcher.group(2));
}};

String input = "lorem substr1 ipsum substr2 dolor substr3 amet";

// create the pattern joining the keys with '|'. Need to add groups for referencing later
String regexp = "(\\s{2,})|(\\.(?:[a-zA-Z]))";

StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile(regexp);
Matcher m = p.matcher(input);

while (m.find()) {
    //TODO change to find which groupNum matched
    m.appendReplacement(sb, replacements.get(m.group(groupNum)));
}
m.appendTail(sb);


System.out.println(sb.toString());   // lorem repl1 ipsum repl2 dolor repl3 amet

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.