I have a text with two types of comments. The ones delimited by % and the ones that start with /* and end with */. For instance:
INPUT1:
Sarah was going out. % Remember she usually doesn't go out % It was very cold.DESIRED_OUTPUT1:
Sarah was going out. It was very cold.INPUT2:
Sarah was going out. /* Remember she usually doesn't go out */ It was very cold.DESIRED_OUTPUT2:
Sarah was going out. It was very cold.INPUT3:
Charles knocked on the door and a woman opened it. % Hmm, is this good... /* Not sure */ Perhaps this should happen in chapter 10 instead? % She looked at him. - Yes?, she said.DESIRED_OUTPUT3:
Charles knocked on the door and a woman opened it. She looked at him. - Yes?, she said.INPUT4:
Charles knocked on the door and a woman opened it. % Hmm, is this good... /* Not sure to 100% */ Perhaps this should happen in chapter 10 instead? % She looked at him. - Yes?, she said.DESIRED_OUTPUT4:
Charles knocked on the door and a woman opened it. */ Perhaps this should happen in chapter 10 instead?
Basically, I would like that when encountering an opening comment mark, everything is removed until its respective closing comment mark (even if this means removing comment marks of the other type).
If a comment is opened, either with % or /*, but never closed, the comment is assumed to continue until the end of the text. However, if it is present just a closing mark of this kind */ (because the opener was inside another comment and hence removed), it should be left in the text.
Charles knocked on the door and a woman opened it. */ Perhaps this should happen in chapter 10 instead? % She looked at him. - Yes?, she said.. You seem to want.replaceAll("%[^%]*%|/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/",""). There is a chance you need to make the last%optional though,"%[^%]*%?|/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/".% She looked at him. - Yes?, she said.should not be removed in Input 4..replaceAll("%[^%]*%?|/\\*[^*]*(?:\\*(?!/)[^*]*)*(?:\\*/)?","")