1

I have a situation where I have to gather information from a webpage. I need to extract anything encapsulated td tags of the HTML tables

In this particular situation the only thing I have available to do this process is PowerShell.

Is there an easy way to do this only using PowerShell?

2 Answers 2

2

I think you have to main options:

  1. Use a regular expression.
  2. Use the DOM.

Here's how you can do both:

Regex:

$data = (new-object System.Net.WebClient).DownloadString('http://www.amazon.com')
[regex]::Matches($data, '<td.*?>(.+)</td>') | % {$_.Captures[0].Groups[1].value}

DOM:

$ie = new-object -com InternetExplorer.Application
$ie.Navigate('http://www.amazon.com')
$ie.Document.getElementsByTagName('td')
Sign up to request clarification or add additional context in comments.

1 Comment

Worked perfectly. thank you so much for your assistance. Andy Arismendi. I used the Regex version
0
$ie = new-object -com "InternetExplorer.Application"
$ie.navigate("<app url>")
$doc = $ie.Document
$doc.getElementByID("<some id>")

You may read here for more information - http://msdn.microsoft.com/en-us/magazine/cc337896.aspx

Hope this helps.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.