1

Let's say I have a HUGE file. This huge file contains a bunch of code. In this code every function is documented in XML format. Each documentation is enclosed in 'documentation comments' (/** and **/). I want a regular expression that will remove all code that is not between documentation comments (that would also include the documentation comments but I can remove them afterwards if needed).

Example of part of the script:

/**--------------------------------------------------------------------------**\
<summary>FunctionName</summary>
<returns>
    Returns 1 on success.
    Returns 0 on failure.
</returns>
<remarks>
    This function is a function.
</remarks>
\**--------------------------------------------------------------------------**/

int FunctionName()
{
    int X = 1;
    if(X == 1)
        return 1;
    return 0;
}

Expected output:

<summary>FunctionName</summary>
<returns>
    Returns 1 on success.
    Returns 0 on failure.
</returns>
<remarks>
    This function is a function.
</remarks>
1
  • 1
    So you want to create a document consisting of the documentation comments alone? Don't think of it as removing or excluding the parts you don't want, just match the parts you do want and write them to a new file. Commented Nov 7, 2015 at 20:36

1 Answer 1

1

You can use this pattern:

/^\/(\*\*-+\*\*)\\$(.*?)^\\\1\/$|./gsm

and replace with $2.

Working example: https://regex101.com/r/fA8bP0/1

The trick is basically the same as in Regex Pattern to Match, Excluding when… / Except between - use alternation to match what we need, and skip over everything we don't want.

Some notes about the pattern:

  • ^ and $ are not strictly needed - it depends on whether the comments are on a whole line. You can remove them, and remove the /m (multiline) flag.
  • \/(\*\*-+\*\*)\\ matches a whole line of a comment, /**-------**\.
  • We assume these is the same number of hyphens at the beginning of the block as at the end, and capture it to \1. If this is not correct, use \*\*-+\*\* again instead of \1. If you have a fixed number of hyphens, you can use -{74}.
  • The interesting content is captured to $1.
  • Everything else is matched by the ., and replaced away.
  • Caveat: this pattern may fail in the usual ways - strings that contain "/**-", commented code that looks like documentation, escaped characters, etc.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.