2

I want to replace all Java-style comments (/* */) with the number of new lines for that comment. So far, I can only come up with something that replaces comments with an empty string

String.replaceAll("/\\*[\\s\\S]*?\\*/", "")

Is it possible to replace the matching regexes instead with the number of new lines it contains? If this is not possible with just regex matching, what's the best way for it to be done?

For example,

/* This comment
has 2 new lines
contained within */

will be replaced with a string of just 2 new lines.

3
  • 2
    What's going to happen when you run your program against code that contains, for example, String comment = "/* this is a comment*/";? Regex is the wrong tool for the job, you will need a real Java parser. Commented Sep 5, 2019 at 21:20
  • 1
    @JimGarrison - Regex is not the wrong tool for the job, it's the only tool for the job apart from a language parser. A regex can parse strings as well as comments, even at the same time. It's easy, but wasn't requested. Commented Sep 6, 2019 at 18:15
  • 1
    I included an expanded regex that parses quoted strings at the same time. This is another advanced technique smothered by the put on hold as too broad by Jim Garrison, Wiktor Stribiżew, mpromonet yesterday . It's a shame the SO community searching for answers will not be able to see this. It's quite narrow minded to be put on hold but it appears more and more to be typical now on SO. Commented Sep 7, 2019 at 22:13

3 Answers 3

1

Since Java supports the \G construct, just do it all in one go.
Use a global regex replace function.

Find

"/(?:\\/\\*(?=[\\S\\s]*?\\*\\/)|(?<!\\*\\/)(?!^)\\G)(?:(?!\\r?\\n|\\*\\/).)*((?:\\r?\\n)?)(?:\\*\\/)?/"

Replace

"$1"

https://regex101.com/r/l1VraO/1

Expanded

 (?:
      / \* 
      (?= [\S\s]*? \* / )
   |  
      (?<! \* / )
      (?! ^ )
      \G 
 )
 (?:
      (?! \r? \n | \* / )
      . 
 )*
 (                             # (1 start)
      (?: \r? \n )?
 )                             # (1 end)
 (?: \* / )?

==================================================
==================================================

IF you should ever care about comment block delimiters started within
quoted strings like this

String comment = "/* this is a comment*/"

Here is a regex (addition) that parses the quoted string as well as the comment.
Still done in a single regex all at once in a global find / replace.

Find

"/(\"[^\"\\\\]*(?:\\\\[\\S\\s][^\"\\\\]*)*\")|(?:\\/\\*(?=[\\S\\s]*?\\*\\/)|(?<!\")(?<!\\*\\/)(?!^)\\G)(?:(?!\\r?\\n|\\*\\/).)*((?:\\r?\\n)?)(?:\\*\\/)?/"

Replace

"$1$2"

https://regex101.com/r/tUwuAI/1

Expanded

    (                             # (1 start)
         "
         [^"\\]* 
         (?:
              \\ [\S\s] 
              [^"\\]* 

         )*
         "
    )                             # (1 end)
 |  
    (?:
         / \* 
         (?= [\S\s]*? \* / )
      |  
         (?<! " )
         (?<! \* / )
         (?! ^ )
         \G 
    )
    (?:
         (?! \r? \n | \* / )
         . 
    )*
    (                             # (2 start)
         (?: \r? \n )?
    )                             # (2 end)
    (?: \* / )?
Sign up to request clarification or add additional context in comments.

Comments

1

You can do it with a regex "replacement loop".

Most easily done in Java 9+:

String result = Pattern.compile("/\\*(?:[^*]++|\\*(?!/))*+\\*/").matcher(input)
                       .replaceAll(r -> r.group().replaceAll(".*", ""));

The main regex has been optimized for performance. The lambda has not been optimized.

For all Java versions:

Matcher m = Pattern.compile("/\\*(?:[^*]++|\\*(?!/))*+\\*/").matcher(input);
StringBuffer buf = new StringBuffer();
while (m.find())
    m.appendReplacement(buf, m.group().replaceAll(".*", ""));
String result = m.appendTail(buf).toString();

Test

final String input = "Line 1\n"
                   + "/* Inline comment */\n"
                   + "Line 3\n"
                   + "/* One-line\n"
                   + "   comment */\n"
                   + "Line 6\n"
                   + "/* This\n"
                   + "   comment\n"
                   + "   has\n"
                   + "   4\n"
                   + "   lines */\n"
                   + "Line 12";

Matcher m = Pattern.compile("(?s)/\\*(?:[^*]++|\\*(?!/))*+\\*/").matcher(input);
String result = m.replaceAll(r -> r.group().replaceAll(".*", ""));

// Show input/result side-by-side
String[] inLines = input.split("\n", -1);
String[] resLines = result.split("\n", -1);
int lineCount = Math.max(inLines.length, resLines.length);
System.out.println("input                    |result");
System.out.println("-------------------------+-------------------------");
for (int i = 0; i < lineCount; i++) {
    System.out.printf("%-25s|%s%n", (i < inLines.length ? inLines[i] : ""),
                                    (i < resLines.length ? resLines[i] : ""));
}

Output

input                    |result
-------------------------+-------------------------
Line 1                   |Line 1
/* Inline comment */     |
Line 3                   |Line 3
/* One-line              |
   comment */            |
Line 6                   |Line 6
/* This                  |
   comment               |
   has                   |
   4                     |
   lines */              |
Line 12                  |Line 12

Comments

0

Maybe, this expression,

\/\*.*?\*\/

on s mode might be close to what you have in mind.

Test

import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class re{

    public static void main(String[] args){

        final String regex = "\\/\\*.*?\\*\\/";
        final String string = "/* This comment\n"
             + "has 2 new lines\n"
             + "contained within */\n\n"
             + "Some codes here 1\n\n"
             + "/* This comment\n"
             + "has 2 new lines\n"
             + "contained within \n"
             + "*/\n\n\n"
             + "Some codes here 2";
        final String subst = "\n\n";

        final Pattern pattern = Pattern.compile(regex, Pattern.DOTALL);
        final Matcher matcher = pattern.matcher(string);

        final String result = matcher.replaceAll(subst);

        System.out.println(result);

    }
}

Output

Some codes here 1






Some codes here 2

If you wish to explore/simplify/modify the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


1 Comment

That surely doesn't work. How do you know to replace with 2 \n? You don't, which is why you get it wrong, given that second comment has 3 of them, and is incorrectly replaced by 2. Downvoted for incorrect result (aka not useful).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.