0

I am novice powershell user please bear with that. I have tried to parse html table in powershell for strings between tags, Here is the HTML

    <head>
        <title>HTML TABLE</title>
        </head><body>
        <table>
        <colgroup><col/><col/></colgroup>
        <tr><th>TestcaseName</th><th>Status</th></tr>
        <tr><td>abcd </td><td>First </td></tr>
        <tr><td>xyz </td><td>Second </td></tr>
        <tr><td>pqr </td><td>Third </td></tr>
        </table>
        </body>
</html>

Here is the code which I have tried

$arr = @()
$path = "C:\test.html"
$pattern =  '(?i)<tr[^>]*><td[^>]*>(.*)</td><td>'

Get-Content $path | Foreach {if ([Regex]::IsMatch($_, $pattern)) {
           $arr += [Regex]::Match($_, $pattern)
            }
        }
$arr | Foreach {$_.Value}

Expected output is

abcd
xyz
pqr

But it results in

<tr><td>abcd </td><td>
<tr><td>xyz </td><td> 
<tr><td>pqr </td><td>

Can anyone mention why the tags are also getting as output and how to avoid this. Also I want to append text to each array elements eg: <a href="\\192.116.1.2\cluster_110">abcd, <a href="\\192.116.1.3\cluster_110">xyz etc, please mention the same as it involves special characters.

2 Answers 2

2

If the file is always going to be valid xml, you could cast it to xml and do something like the below:

[xml] $xml = Get-Content $path

$xml.SelectNodes("//tr") |
  Where-Object {$_.ChildNodes.Count -gt 0 -and $_.ChildNodes[0].Name -eq 'td'} |
  ForEach-Object {$_.ChildNodes[0].InnerText}

You can append whatever you like to the results inside the ForEach-Object

Sign up to request clarification or add additional context in comments.

Comments

1

Try this:

(?<=\<td\>)(.*?(?=\</td\>))

The reason the tags are picked up is the same reason the inside of the tags are picked up. Unless you specify, Regex will return EVERYTHING it matches. You can use lookaround assertions to match text but exclude it from the capture, hence the ?<= and ?= in the regex above.

http://www.regular-expressions.info/lookaround.html

As for appending, you can do this:

$Arr | Foreach {$Nope+ $_ + $ChuckTesta}
$Begin + $Arr[0] + $End

Doing this will implicitly convert it from a Match to a String; you have been warned. I don't think there is a way to do this while keeping it a Match but I'm probably wrong in assuming.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.