0

I am trying to load a HTML document into a DOM object. What I want is a list of all hyperlinks on that page. I would love to know for each link what its properties are and what text it has.

I currently worked out a basic script that does all of that.. Except... the text.

<?php

$html = file_get_contents('test.html');

$dom            = new DOMDocument;
@$dom->loadHTML($html);
$Links         = $dom->getElementsByTagName('a');

foreach ($Links as $node) {
    echo 'HREF = '.$node->getAttribute('href').PHP_EOL;
    echo 'Title = '.$node->getAttribute('title').PHP_EOL;
    echo 'Alt = '.$node->getAttribute('alt').PHP_EOL;
    echo 'Class = '.$node->getAttribute('class').PHP_EOL;
    echo 'ID = '.$node->getAttribute('id').PHP_EOL;
    echo 'Style = '.$node->getAttribute('style').PHP_EOL;
    echo 'Link text = '.$node->Data().PHP_EOL;
}

?>

I have no clue how to get the text from the object.

So I want to know

<a href=somelink> **THIS TEXT IS WHAT I WANT TO SUBSTRACT**</a>

The line that currently does not work has :

echo 'Link text = '.$node->Data().PHP_EOL;

I hope that there is a function within the context that I am looking for.

Just found the solution!

<?php

$html = file_get_contents('test.html');

$dom          = new DOMDocument;
@$dom->loadHTML($html);
$Links         = $dom->getElementsByTagName('a');

foreach ($Links as $node) {
    echo 'HREF = '.$node->getAttribute('href').PHP_EOL;
    echo 'Title = '.$node->getAttribute('title').PHP_EOL;
    echo 'Alt = '.$node->getAttribute('alt').PHP_EOL;
    echo 'Class = '.$node->getAttribute('class').PHP_EOL;
    echo 'ID = '.$node->getAttribute('id').PHP_EOL;
    echo 'Style = '.$node->getAttribute('style').PHP_EOL;
    echo 'Link text = '.$node->nodeValue.PHP_EOL;
}

?>

Solution for this issue is :

**echo 'Link text = '.$node->nodeValue.PHP_EOL;**

Or as I read... node->textContent should ALSO work

@PoopNoodles... I found the solution on another site. But it might be interesting to know that there is another option as well. I do not know the difference between nodeValue and textContent though.

2
  • possible duplicate of php domdocument read element inner text (use ->textContent) Commented Aug 18, 2015 at 13:51
  • If you found a solution elsewhere on SO please delete the question. Commented Aug 18, 2015 at 13:55

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.