0

I have searched online and thought this would work but it doesn't for some reason. I'm trying to extract a hyperlink that only displays it's URL from a HTML. I'm only trying to extract the URL within the td align="center". Here is a sample of the HTML doc I'm trying to extract:

<td>
    Aug 17
</td>

<td>
    FT
</td>

<td align="right">
    <a href="site1">Arsenal ruby</a>
</td>

**<td align="center">**
    <a href="site2">1-3</a>
</td>

<td><a href="site3">Aston Villa</a></td>


<td style="text-align:right;">60,003</td>

And here is my PHP code to extract it from the td align="center":

<?php

//$searchURL = "site";
include 'simple_html_dom.php';

$site = 'website';
$html = file_get_html($site);

$tabledata = array();

// Find all TD tags with "align=center"
foreach($html->find('td[align=center]') as $e)
echo $e->href . '<br>';

?>

I know the code works because the code can extract everything if it is just the td within the barracks.

2
  • Have you tried regular expressions? See preg_match Commented Dec 13, 2013 at 23:01
  • You want td[align=center] a Commented Dec 14, 2013 at 1:47

3 Answers 3

2

So you have identified the <td> elements themselves, but you did not go down to the next nesting level to grab the href from the <a> elements. You might do that like this:

foreach($html->find('td[align=center]') as $e)
echo $e->children(0)->href . '<br>';
Sign up to request clarification or add additional context in comments.

4 Comments

This help a lot. Where can I find a good documentation that explains this?
Thanks. one more thing It works great but receiving an error message "Trying to get property of non-object in C:\wamp\www\tutorials\table_new.php on line 20". This is on the echo $e->children(0)->href . '<br>'; that you added, do you know why?
That's because it just reads the href of the first child node in the td, here is no validation that the td has a node or that this node has a href attribute.
1

Use the DOM and Xpath:

Select all td elements in the document

//td

Only if the align attribute equals "center"

//td[@align="center"]

Get the a sub elements

//td[@align="center"]//a

Get the href attribute nodes of that a elements

//td[@align="center"]//a/@href

Source example:

$html = <<<'HTML'
<td>
    FT
</td>
<td align="right">
    <a href="site1">Arsenal ruby</a>
</td>
**<td align="center">**
    <a href="site2">1-3</a>
</td>
<td><a href="site3">Aston Villa</a></td>
<td style="text-align:right;">60,003</td>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);

$nodes = $xpath->evaluate('//td[@align="center"]//a/@href');
foreach ($nodes as $node) {
  var_dump($node->value);
}

Comments

0

You selected the td element. The anchor element is the child of the td element.

// Find all TD tags with "align=center"
foreach($html->find('td[align=center]') as $e)
echo $e->firstChild()->getAttribute('href') . '<br>';

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.