3

I have a text with two types of comments. The ones delimited by % and the ones that start with /* and end with */. For instance:

INPUT1: Sarah was going out. % Remember she usually doesn't go out % It was very cold.

DESIRED_OUTPUT1: Sarah was going out. It was very cold.

INPUT2: Sarah was going out. /* Remember she usually doesn't go out */ It was very cold.

DESIRED_OUTPUT2: Sarah was going out. It was very cold.

INPUT3: Charles knocked on the door and a woman opened it. % Hmm, is this good... /* Not sure */ Perhaps this should happen in chapter 10 instead? % She looked at him. - Yes?, she said.

DESIRED_OUTPUT3: Charles knocked on the door and a woman opened it. She looked at him. - Yes?, she said.

INPUT4: Charles knocked on the door and a woman opened it. % Hmm, is this good... /* Not sure to 100% */ Perhaps this should happen in chapter 10 instead? % She looked at him. - Yes?, she said.

DESIRED_OUTPUT4: Charles knocked on the door and a woman opened it. */ Perhaps this should happen in chapter 10 instead?

Basically, I would like that when encountering an opening comment mark, everything is removed until its respective closing comment mark (even if this means removing comment marks of the other type).

If a comment is opened, either with % or /*, but never closed, the comment is assumed to continue until the end of the text. However, if it is present just a closing mark of this kind */ (because the opener was inside another comment and hence removed), it should be left in the text.

6
  • I think Input 4 should result in Charles knocked on the door and a woman opened it. */ Perhaps this should happen in chapter 10 instead? % She looked at him. - Yes?, she said.. You seem to want .replaceAll("%[^%]*%|/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/",""). There is a chance you need to make the last % optional though, "%[^%]*%?|/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/". Commented Jun 7, 2020 at 9:39
  • Well, thank you for pointing that out. I forgot to mention that whether there is no closing mark (or just a closing one, since the opening was inside another comment and hence removed, like in this case) the mark should stay there. So Input/output 4 is correct. Commented Jun 7, 2020 at 9:45
  • So, acc. to what you say ("the mark should stay there"), the % She looked at him. - Yes?, she said. should not be removed in Input 4. Commented Jun 7, 2020 at 9:47
  • Well, you are right again. If a comment is opened, either with % or / *, but never closed, the comment is assumed to continue until the end of the text. Commented Jun 7, 2020 at 9:53
  • 1
    Aha, so you probably want .replaceAll("%[^%]*%?|/\\*[^*]*(?:\\*(?!/)[^*]*)*(?:\\*/)?","") Commented Jun 7, 2020 at 9:57

1 Answer 1

3

You may use

.replaceAll("%[^%]*%?|/\\*[^*]*(?:\\*(?!/)[^*]*)*(?:\\*/)?","")

See the regex demo

Details

  • %[^%]*%? - %...% like comments with an optional trailing delimiter:
    • % - a % char
    • [^%]* - 0 or more chars other than %
    • %? - an optional % char
  • | - or
  • /\*[^*]*(?:\*(?!/)[^*]*)*(?:\*/)? - /*...*/ like comments with an optional trailing delimiter:
    • /\* - /* string
    • [^*]* - 0 or more chars other than *
    • (?:\*(?!/)[^*]*)* - 0 or more occurrences of
      • \*(?!/) - a * not followed with /
      • [^*]* - 0 or more chars other than *
    • (?:\*/)? - an optional */ substring.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.