1

I'm writing my own minifying tool for practice (regular expresssions practice), but after a few tutorials I'm still not getting it.

For example I'm trying to find and remove all comments from my CSS file and that includes:

  1. Single line comments as in

    /** single line comment ****/ or

    /****single line comment */ and

  2. Multi line comments as in

    /**** start of comment

    .myCssClass

    {

    font:13pt Arial;
    

    }

********* end of comment **/

So far I'm using an expression which can only deal with single line comments as follows

(\/\*.*\*\/)

But what I'm trying to understand about regular expressions is how do I tell the regex engine to span lines as well. I did try this:

(\/\*[.\n]*\*\/)

which doesn't work at all.

Anyone know where I'm going wrong?

Thanks, Jacques

1
  • Typically when you read in a file line-by-line you don't use a regex to span multiple lines. For this you would regex the start of a comment and keep reading in lines to omit until you reach the end comment regex. Commented May 11, 2012 at 15:45

2 Answers 2

3

If you're running the match in C#, have you tried RegexOptions?

Match m = Regex.Match(word, pattern, RegexOptions.Multiline);

"Multiline mode. Changes the meaning of ^ and $ so they match at the beginning and end, respectively, of any line, and not just the beginning and end of the entire string."

Also see Strip out C Style Multi-line Comments

EDIT:

OK..looks like an issue w/ the regex. Here is a working example using the regex pattern from http://ostermiller.org/findcomment.html. This guy does a good job deriving the regex, and demonstrating the pitfalls and deficiencies of various approaches. Note: RegexOptions.Multiline/RegexOptions.Singleline does not appear to affect the result.

string input = @"this is some stuff right here
    /* blah blah blah 
    blah blah blah 
    blah blah blah */ and this is more stuff /* blah */
    right here.";

string pattern = @"(/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)";
string output = Regex.Replace(input, pattern, string.Empty, RegexOptions.Singleline);
Sign up to request clarification or add additional context in comments.

1 Comment

thanks for the input. I'm using the multiline option yes, which has certainly helped for one or two other items.
2

A regular expression which matches C-style comments (which begin with /*, end with */ and do not nest) is:

[/][*]([^*]|[*]*[^*/])*[*]+[/]

(I have a little write up about the derivation of this. See: www.nongnu.org/txr/txr-manpage.html Look for "Appendix A" in the table of contents, and there is a link to "Example: Matching C Language Comments".)

C-style comments can include the sequence /* in the interior, such that /*/**/ is a valid comment. The closest */ terminates the comment so that /* */aaa/* */ is two comments with aaa in between, not one comment. This "non-greedy" behavior complicates the matching in a regex language which has no non-greedy operator.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.