1

I´m using the following REGEXP:

$output = preg_replace( "/\/\/(.*)\\n/", "", $output );

The code works well BUT!!!!, when a URL like (http://this_is_not_a_comment.com/kickme), the code replaces it... (http://)

What can you do to no replace that URLs.

Thanks,

2
  • 2
    You need some kind of parser that can distinguish between code and comments. Commented Nov 25, 2010 at 15:46
  • 2
    You should look at this answer: stackoverflow.com/questions/1732348/… Commented Nov 25, 2010 at 15:53

2 Answers 2

8

You need a regular expression that can distinguish between the code and the comments. In particular, since the sequence of // can either be in a string or a comment, you just need to distinguish between strings and comments.

Here’s an example that might do this:

/(?:([^\/"']+|\/\*(?:[^*]|\*+[^*\/])*\*+\/|"(?:[^"\\]|\\.)*"|'(?:[^'\\]|\\.)*')|\/\/.*)/

Using this in a replace function while replacing the matched string with the match of the first subpattern should then be able to remove the // style comments.

Some explanation:

  • [^/"']+ matches any character that is not the begin of a comment (both //… and /*…*/) or of a string
  • /\*(?:[^*]|\*+[^*/])*\*+/ matches the /* … */ style comments
  • "(?:[^"\\]|\\.)*" matches a string in double quotes
  • '(?:[^'\\]|\\.)*' matches a string in single quotes
  • \/\/.* finally matches the //… style comments.

As the first three constructs are grouped in a capturing group, the matched string is available and nothing is changed when replacing the matched string with the match of the first subpattern. Only if a //… style comment is matched the match of the first subpattern is empty and thus it’s replaced by an empty string.

But note that this may fail. I’m not quite sure if it works for any input.

Sign up to request clarification or add additional context in comments.

6 Comments

Protip to OP: if a regular expression ever looks this hideous, it's probably not a job for regular expressions. Regardless, +1 for being able to even begin to construct something like this.
@Matchu: I had to lookup the regular expression for the /* … */ style comments too.
Nice. Took me some time to get everything. In the two strings, I think the \. should be escaped quotes - \\", or escape anything: \\.. Am I missing something?
If this is for js you'd also need to think of regex quoting, eg /foo\//i
This solution fails to consider literal regexes (which need to be considered when parsing JavaScript), e.g. it will mangle: var re = /\/*notacomment!*/; and m = /\//.test("notacomment!") and var re = /\/*/; // */ thiscommentishandledasascode! and var re = /"/; // " thiscommentishandledasascode!
|
5
$output = preg_replace( "/(?<!\:)\/\/(.*)\\n/", "", $output );

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.