2

I am currently working with a large code base, in which recently one of the API's signature changed. So I need to modify thousands of files to get the new feature. So developed a java program to get take all *.java files and look for old API pattern. If found replace it with new pattern.

Old API

API(3,Utils.FIFTY,key1,key4)

New API

API(key1,key4)

So I created a regex pattern to match the old API as API\([\d,\s\.\w]*(key[\.\w\s,]*)\) If it matches it will replace it with

replaceString = matcher.group(1) + "(" + matcher.group(2) + ")";

So with the current code instead of expected API(key1,key4), I am getting API(key4). I've analyzed the issue and my inference is that the \w caught the first key pattern. If we need to match, we need to do a negative look ahead.

Can any one share the best consistent way to resolve the regex issue ?

2
  • 1
    Why not @deprecate the API(3,Utils.FIFTY,key1,key4) form, call the new form internally, and recompile? Commented Feb 20, 2013 at 18:54
  • The API is developed by another team and we don't have access to it. The API still exists for backward compatibility. Since we need the new feature, we need to call the new method Commented Feb 20, 2013 at 18:58

3 Answers 3

2

The F.J's answer doesn't match this test case:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class APIUpdater {
   public static void main( String[] args ) {
      String source = "\n" +
        "API( key.getValue( 18 ),call( key1 ).mth(),key1,key4);\n" +
        "API(\n" +
        "\t3,\n" +
        "\tUtils.FIFTY,\n" +
        "\tkey1,\n" +
        "\tkey4 );\n" +
        "API(3,Utils.FIFTY,key1,key4);\n";
      Pattern p =
         Pattern.compile( "API\\([.\\w\\s,]*?,\\s*(key[\\.\\w\\s,]*)\\)" );
      Matcher m = p.matcher( source );
      while( m.find())
      {
         System.err.println( m.replaceAll( "API(key1,key4)" ));
      }
   }
}

Output is:

API( key.getValue( 18 ),call( key1 ).mth(),key1,key4);
API(key1,key4);
API(key1,key4);

A call on several lines doesn't match but spaces are correctly handled.

A true parser with a grammar is required to parse Java, a regular expressions can't do this complex job because they works at lexical level (the words, not the sentences).

Sign up to request clarification or add additional context in comments.

1 Comment

I don't have complex scenarios like the first example you mentioned. But the second and third samples are valid for me, means the API can be broken in to multiple lines due to code formatting. Such cases the the reluctant quantifier approach(*?) works fine. We have already incorporated \s for the whitespace chanracters.
1

Something like the following should work:

API\([\.\w \t,]*?,\s*(key[\.\w \t,]*)\)

The main change here was to change the repetition on the first character class from * to *?, this means it will now match as few characters as possible instead of as many as possible, so you all of your key arguments will be included in your matching group.

2 Comments

What about line feed between ( and )?
@Aubin I replaced the \s so the character classes will match spaces and tabs but not line feeds, good suggestion.
1

You may want to try Recoder, which allows you to apply source code transformations.

2 Comments

+1: this solution tackle the problem at correct abstraction level
I liked the Recorder pattern and will have try.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.