0

I'm trying to get the text data from a child node of an element using PHP and DOM.

Here is the HTML data I'm having trouble parsing. I'm trying to obtain the email address.

<tr>
<th>Engineer:</th>
<td id="contact_person">Jack Smith &lt<a href='mailto:[email protected]'>[email protected]</a>&gt
    <table class='transparent'>
        <tr>
            <td>Work Phone</td>
            <td>(555) 555-5555</td>
        </tr>
    </table>
</td>

Here is my current code for processing that element:

$contact = $dom->getElementById("contact_person")->nodeValue;

This is the result I'm getting:

Jack Smith Work Phone(555) 555-5555

UPDATE: Removing &lt and &gt and replacing with a single hyphen between name and email address returns the following:

Jack Smith - [email protected] Phone(555) 555-5555

This is what I want to get:

[email protected]

I tried to get the developer to move the "id=contact_person" to the anchor that holds the email address. Things work fine when I do that in test, but it is not possible to do in our system.

I'm sure it's apparent, but I'm not really familar with DOM and looking for any guidance...

FINAL UPDATE: THE FIX:

$dom->getElementById("contact_person")->firstChild->nextSibling->nodeValue;
3
  • It's very odd that your DOM library is not returning the email address in your first example. Is the result really Jack Smith Work Phone(555) 555-5555 and not Jack Smith &[email protected]&gtWork Phone(555) 555-5555? Commented Sep 25, 2015 at 0:42
  • removing the &lt and &gt and replacing with a single hypen - did at least make the email address appear: <td id="customer_engineer">Jack Smith - <a href='mailto:[email protected]'>[email protected]</a> produces the result: Jack Smith - [email protected] Phone(555) 555-5555 Commented Sep 25, 2015 at 1:48
  • sorry. poor attempt to hide my code. assume contact_person is synonymous with customer_engineer. Commented Sep 25, 2015 at 2:00

3 Answers 3

1

This is ultimately what fixed the issue:

$dom->getElementById("contact_person")->firstChild->nextSibling->nodeValue;
Sign up to request clarification or add additional context in comments.

Comments

0

Try something like:

$contact = $dom->getElementById("contact_person")->firstChild->nodeValue;

5 Comments

firstChild->nodeValue; returned "Jack Smith". lastChild->nodeValue; returned "Work Phone(555)555-5555"
So it seems like you want childNodes[1]
Changed to childNodes[1]: $dom->getElementById("contact_person")->childNodes[1]; Got the following Error: Fatal error: Cannot use object of type DOMNodeList as array
Oops. Try childNodes->item(1). The PHP documentation is here: php.net/manual/en/class.domnode.php#domnode.props.childnodes
thanks for the link. changing to childNodes->item(1) continued to error. Your link led me to experiment... $dom->getElementById("contact_person")->firstChild->nextSibling->nodeValue; this worked!
0

It may be more reliable to use an XPath query rather than using firstChild, nextSibling etc.

$xpath = new DOMXPath($dom);
$node = $xpath->query("//*[@id='contact_person']//a[contains(@href,'mailto:')]")->item(0);
if( $node) {
    $email = $node->nodeValue;
}
else {
    $email = "NOT FOUND";
}

This will look for any link containing "mailto", regardless of where it is inside #contact_person. This means that it no longer relies on precise structure, just the container's ID and the fact that it is a mailto link.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.