1

I want to remove comments in xml files based on the xml tags inside the comment with Powershell.
Constraints:

  • Multi line comments should be supported
  • Keep xml formatting (e.g. do not write everything into a single line or remove indents)
  • Keep file encoding

My function UncommentXmlNode should remove the <!-- ... --> and keep the <InnerXml>. My function UncommentMyTwoNodes should remove comments from two different xml tags. You find two tests:

  1. it "uncomments myFirstOutcommentedXml and mySecondOutcommentedXml" is running smoothly
  2. it "uncomments both if both are in same comment" fails unless you insert (`n)?.*. In that case, 1. breaks.

The tests are fairly easy to understand, if you look at [xml]$expected and the two respective [xml]$inputXml values. The code here is a fully functional Pester test suite to reproduce my issue. You might have to create C:\temp or install Pester v5.

Import-Module Pester

Describe "Remove comments"{
    BeforeAll {
      function UncommentXmlNode {
        param (
            [String] $filePath,
            [String] $innerXmlToUncomment
        )
        $content = Get-Content $filePath -Raw
        $content -replace "<!--(?<InnerXml>$innerXmlToUncomment)-->", '${InnerXml}' | Set-Content -Path $filePath -Encoding utf8
    }

    function UncommentMyTwoNodes {
        param (
          [xml]$inputXml,
          [string]$inputXmlPath
        )    
        UncommentXmlNode -filePath $inputXmlPath -innerXmlToUncomment "<myFirstOutcommentedXml.*" #Add this to make second test work (`n)?.*
        UncommentXmlNode -filePath $inputXmlPath -innerXmlToUncomment "<mySecondOutcommentedXml.*"
    }

[xml]$expected = @"
<myXml>
  <!-- comment I want to keep -->
  <myFirstOutcommentedXml attributeA="xy" attributeB="true" />
  <mySecondOutcommentedXml attributeA="xy" attributeB="true" />
  <myOtherXmlTag attributeC="value" />
  <!-- comment I want to keep -->
</myXml>
"@
  }
    it "uncomments myFirstOutcommentedXml and mySecondOutcommentedXml"{
          [xml]$inputXml = @"
<myXml>
  <!-- comment I want to keep -->
  <!--<myFirstOutcommentedXml attributeA="xy" attributeB="true" />-->
  <!--<mySecondOutcommentedXml attributeA="xy" attributeB="true" />-->
  <myOtherXmlTag attributeC="value" />
  <!-- comment I want to keep -->
</myXml>
"@

      $tempPath = "C:\temp\test.xml"
      $inputXml.Save($tempPath)
      UncommentMyTwoNodes -inputXml $inputXml -inputXmlPath $tempPath
      [xml]$result = Get-Content $tempPath
      $result.OuterXml | Should -be $expected.OuterXml
    }
  
    it "uncomments both if both are in same comment"{
        [xml]$inputXml = @"
<myXml>
  <!-- comment I want to keep -->
  <!--<myFirstOutcommentedXml attributeA="xy" attributeB="true" />
  <mySecondOutcommentedXml attributeA="xy" attributeB="true" />-->
  <myOtherXmlTag attributeC="value" />
  <!-- comment I want to keep -->
</myXml>
"@
      $tempPath = "C:\temp\test.xml"
      $inputXml.Save($tempPath)
      UncommentMyTwoNodes -inputXml $inputXml -inputXmlPath $tempPath
      [xml]$result = Get-Content $tempPath
      $result.OuterXml | Should -be $expected.OuterXml
    }
  }
1
  • I did post a fully minimal reproducible example (requires Pester 5 and the C:\temp folder). Commented Feb 1, 2023 at 8:21

1 Answer 1

1

I made some changes to your code to make it easier to test::

  • first of all just working with plain strings without converting to [xml] and calling .OuterXml
  • second, just working with plain strings and not reading / writing to disk
  • I've also removed all the Pester testing code for the sake of clarity

So, here's some test data to work with:

$expected = @"
<myXml>
  <!-- comment I want to keep -->
  <myFirstOutcommentedXml attributeA="xy" attributeB="true" />
  <mySecondOutcommentedXml attributeA="xy" attributeB="true" />
  <myOtherXmlTag attributeC="value" />
  <!-- comment I want to keep -->
</myXml>
"@

# two tags inside separate xml comments
$inputXml1 = @"
<myXml>
  <!-- comment I want to keep -->
  <!--<myFirstOutcommentedXml attributeA="xy" attributeB="true" />-->
  <!--<mySecondOutcommentedXml attributeA="xy" attributeB="true" />-->
  <myOtherXmlTag attributeC="value" />
  <!-- comment I want to keep -->
</myXml>
"@

# two tags inside a single xml comment
$inputXml2 = @"
<myXml>
  <!-- comment I want to keep -->
  <!--<myFirstOutcommentedXml attributeA="xy" attributeB="true" />
  <mySecondOutcommentedXml attributeA="xy" attributeB="true" />-->
  <myOtherXmlTag attributeC="value" />
  <!-- comment I want to keep -->
</myXml>
"@

Here's the updated functions:

function UncommentXmlNode
{
    param
    (
        [string] $xml,
        [string] $uncomment
    )
    return $xml -replace "(?s)<!--(?<InnerXml><$uncomment.*?)-->", '${InnerXml}'
    #                     ^^^^                           ^^^
    #                     single-line (eats `n)          lazy / non-greedy
}

function UncommentMyTwoNodes
{
    param (
      [string] $xml
    )    
    $xml = UncommentXmlNode -xml $xml -uncomment "myFirstOutcommentedXml"
    $xml = UncommentXmlNode -xml $xml -uncomment "mySecondOutcommentedXml"
    return $xml
}

And here's some example usage:

(UncommentMyTwoNodes -xml $inputXml1) -eq $expected
# True

(UncommentMyTwoNodes -xml $inputXml2) -eq $expected
# True

The differences are:

  • enabling the single-line option in the regex - (?s) - "so that it matches every character, instead of matching every character except for the newline character \n"

  • turning the greedy .* into a lazy .*? by adding a lazy quantifier. This is needed because otherwise (?s) above causes your --> to match the last instance in the input string. Changing it to lazy makes it match the first --> after the opening <!--.

This works for both your test cases now, but you might find other edge-cases that still fail (including if $uncomment contains regex escape chars)...


Epilogue

Treating xml as plain text isn't always the best plan. For example the above function will fail with simple pathological cases - for example:

  • Whitespace in the element text - e.g.:
<!--<   myFirstOutcommentedXml attributeA="xy" attributeB="true" />-->
     ^^^

A more robust approach would be to parse the xml and then process all the comment nodes to check their contents:

$comments = ([xml] "...").SelectNodes("//comment()")
foreach( $comment in $comments )
{
    ...
}
Sign up to request clarification or add additional context in comments.

2 Comments

This is the best answer I ever received. Honestly, thank you very much! Can I buy you a coffee somehow? With suggestion to use SelectNodes, I still have the treat each $comment as a string, right?
@tomwaitforitmy - it’s a kind offer, but not necessary - I answer questions here to stretch my brain and tackle areas of Powershell that I don’t normally run into in my day job, so that’s reward enough :-).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.