2

In my program I will be reading a java file line by line, and if there is any string literal in that line, i will replace it with (say) "ABC".

Is there any regex to do so?

Ex. If the Java file passed to my program is:

public class TestClass {

    private static final boolean isNotThis = false;

    public static void main(String[] args) {
        String x = "This is a test String";
        dummyMethodCall();
        if(isNotThis){
            makeItThat();
            System.out.println("work is done");
        }
    }
}

Then the output java file should be:

public class TestClass {

    private static final boolean isNotThis = false;

    public static void main(String[] args) {
        String x = "ABC";
        dummyMethodCall();
        if(isNotThis){
            makeItThat();
            System.out.println("ABC");
        }
    }
}

I am willing to know the regex that will help me to detect all string literals and replace them with a particular string of my choice.

EDIT:

The real challenge for me is to avoid those quote-characters inside a string. (if somebody puts a quote character with an escape character inside the string)

5
  • "\".*\"" solves most of the cases. You may want to be worried about word boundary and lookback. Commented Aug 31, 2013 at 11:13
  • 4
    A general solution is difficult, consider lines like /* " */ s = "ab"; Commented Aug 31, 2013 at 11:13
  • Am wondering why you want to do this..you could reflect on the type instead.. Commented Aug 31, 2013 at 11:15
  • 1
    @BimanTripathy: I've edited my answer. Now it can handle multiple literals in one line. Consider deleting your followup question and let me know if it works for you. Commented Aug 31, 2013 at 19:28
  • @jlordo thanks... will check and take proper action in a min. :) Commented Aug 31, 2013 at 19:30

4 Answers 4

5

Consider the following regular expression:

String regex = "\"(?:\\\\\"|[^\"])*?\"";

It starts with a quote, followed by zero or more non-quote characters or escaped quote characters. The last character has to be a quote.

If you apply this regex to java code, remember that it also matches text inside quotes in comments. If you have unbalanced quotes in your comments it won't match string literals (it will then match the exact opposite).

If you had the example you posted in a String variable named example the following would work:

String wanted = example.replaceAll(regex, "\"ABC\"");

Here's a full example:

String literal = "String foo = \"bar\" + \"with\\\"escape\" + \"baz\";";
String regex = "\"(?:\\\\\"|[^\"])*?\"";
String replacement = "\"\"";
String wanted = literal.replaceAll(regex, replacement);
System.out.println(literal);
System.out.println(wanted);

prints

String foo = "bar" + "with\"escape" + "baz";
String foo = "" + "" + "";
Sign up to request clarification or add additional context in comments.

6 Comments

the problem is solved. thanks. but i couldn't understand why 5 slashes? for \" we can have \\\" i.e. just 2 escape \ characters. although it works only with 5 slashes. why?
@BimanTripathy: the fifth one escapes the quote in the string literal. The first four in the literal are actually only 2 in the string. In a Java regex you also have to escape the slash, so those 2 are one in the regex. The string literal "\\\\\"" is the string \\", which is the pattern \". Understand?
(1)to represent " we needed \" (2)thats why to represent \" we needed \\" (later \ is escape char) (3)and, since each of the characters \ ,\ ," needed an escape character each, we have 5 \ in total. correct so far? But what i didn't understand is why we need 3 levels of putting-escape-characters? any good tutorial link you can suggest?
@BimanTripathy: use backticks ` in comments to mark code. The way it's now I don't understand your previous comment, due to the escaping that's done by stackoverflow.
@BimanTripathy: You need 3 levels, because in your pattern you want \" (an escaped quote), so in regex you need \\" that's why the java string literal needs to be "\\\\\"". I only know the Oracle Java Regex Tutorial.
|
2

Based on Uri's answer of using the parser grammar in this question:

"(?:\\[\\'"tnbfru01234567]|[^\\"])*?"

as Java string:

"\"(?:\\\\[\\\\'\"tnbfru01234567]|[^\\\\\"])*?\""

Explanation (see also Java String escape sequences):

"                          // start with a double quote
  (?:                      // a non-capture group
    \\[\\'"tnbfru01234567] // either an escape sequence
  |                        // or
    [^\\"]                 // not an escape sequence start or ending double quote
  )*?                      // zero or more times, not greedy
"                          // ending double quote

Example (jlordo's solution fails on this):

    String literal = "String foo = \"\\\\\" + \"bar\" + \"with\\\"escape\" + \"baz\" + \"\\117\\143\\164\\141\\154\";";
    String regex = "\"(?:\\\\[\\\\'\"tnbfru01234567]|[^\\\\\"])*?\"";
    String replacement = "\"\"";
    String wanted = literal.replaceAll(regex, replacement);
    System.out.println(literal);
    System.out.println(wanted);

1 Comment

Missing octal escape sequence. The raw character \r and \n should also be excluded from matching (in the [^\\"] part).
0
s = s.replaceAll("\"([^\n\"\\]+|\\\\.)*\"", "\"ABC\"");

This searches quote, and any either non-quotes/non-backslahes/non-linefeeds or backslash+character, till quote.

\"
  (
    [^\n\"\\]+
  |
    \\\\.
  )*
\"
  • [^ ... ] non of the enclosed chars, range possible too A-Z.
  • | or.
  • . any character, by default not line endings.
  • ... + one or more of ... .
  • ... * zero or more of ... .

Comments

-1

You can use this also \b(?:(?<=")[^"]*(?=")|\w+)\b. This will find all the strings which surrounded with Double qoutes ("example").

Sample Code:

String line="\"Hello\" World"
Pattern pattern = Pattern.compile("\b(?:(?<=\")[^\"]*(?=\")|\w+)\b");
Matcher matcher = pattern.matcher(line);
while(matcher.find()) {
//replace the string with you string
}

The output will be Actual line: "Hello" World Answer : ABC World

2 Comments

will this take care of quote symbols inside a string too? ex: String x = "HelloDear\"Friend"; like in the ex. above, I've put a quote using an escape character
You just need to change/replace \\w to a character class like [\\w\\?\\*\\_], which will match any word character and any of the escaped special characters.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.