I am trying to load a HTML document into a DOM object. What I want is a list of all hyperlinks on that page. I would love to know for each link what its properties are and what text it has.
I currently worked out a basic script that does all of that.. Except... the text.
<?php
$html = file_get_contents('test.html');
$dom = new DOMDocument;
@$dom->loadHTML($html);
$Links = $dom->getElementsByTagName('a');
foreach ($Links as $node) {
echo 'HREF = '.$node->getAttribute('href').PHP_EOL;
echo 'Title = '.$node->getAttribute('title').PHP_EOL;
echo 'Alt = '.$node->getAttribute('alt').PHP_EOL;
echo 'Class = '.$node->getAttribute('class').PHP_EOL;
echo 'ID = '.$node->getAttribute('id').PHP_EOL;
echo 'Style = '.$node->getAttribute('style').PHP_EOL;
echo 'Link text = '.$node->Data().PHP_EOL;
}
?>
I have no clue how to get the text from the object.
So I want to know
<a href=somelink> **THIS TEXT IS WHAT I WANT TO SUBSTRACT**</a>
The line that currently does not work has :
echo 'Link text = '.$node->Data().PHP_EOL;
I hope that there is a function within the context that I am looking for.
Just found the solution!
<?php $html = file_get_contents('test.html'); $dom = new DOMDocument; @$dom->loadHTML($html); $Links = $dom->getElementsByTagName('a'); foreach ($Links as $node) { echo 'HREF = '.$node->getAttribute('href').PHP_EOL; echo 'Title = '.$node->getAttribute('title').PHP_EOL; echo 'Alt = '.$node->getAttribute('alt').PHP_EOL; echo 'Class = '.$node->getAttribute('class').PHP_EOL; echo 'ID = '.$node->getAttribute('id').PHP_EOL; echo 'Style = '.$node->getAttribute('style').PHP_EOL; echo 'Link text = '.$node->nodeValue.PHP_EOL; } ?>Solution for this issue is :
**echo 'Link text = '.$node->nodeValue.PHP_EOL;**Or as I read... node->textContent should ALSO work
@PoopNoodles... I found the solution on another site. But it might be interesting to know that there is another option as well. I do not know the difference between nodeValue and textContent though.
->textContent)