Parse HTML Table with DOM and XPath

Question

I'm trying to parse an HTML Table with XPath. The URL is: click here.

I use FireBug to see page's DOM and i understand the container i need.

<tbody>
<tr class="r1">
<td class="l rbrd">
<img class="spr2 sport sp1" align="absmiddle" src="/s.gif">
</td>
<td class="l rbrd">19/4 18:30</td>
<td class="l rbrd">
<a title="CHELSEA FC - SUNDERLAND" href="/chelsea-fc-vs-sunderland/e/4509648/" target="_blank">CHELSEA FC - SUNDERLAND</a>
</td>
<td class="c w40">
<span class="o">1,21</span>
<span class="p">92,8%</span>
</td>
<td class="c w10 rbrd">
<span class="o">
<span class="p">
</td>
<td class="c w40">
<span class="o">8,00</span>
<span class="p">4,7%</span>
</td>
<td class="c w10 rbrd">
<span class="o">
<span class="p">
</td>
<td class="c w40">
<span class="o">18,00</span>
<span class="p">2,5%</span>
</td>
<td class="c w10 rbrd">
<span class="o">
<span class="p">
</td>
<td class="c emph">
<span class="o">353.660 €</span>
</td>
<td class="c w10 emph rbrd">
<img class="imgdiff" width="10" height="10" src="http://img.oxytropis.com/s.gif">
</td>
<td class="c rbrd">
<span class="o">1,56</span>
<span class="p">67,5%</span>
</td>
<td class="c rbrd">
<span class="o">2,74</span>
<span class="p">32,5%</span>
</td>
<td class="c emph rbrd">
<span class="o">6.243 €</span>
</td>
<td class="c rbrd">
<a onclick="_gaq.push(['_trackEvent','betfair','click','tziroi-out']);" href="http://sports.betfair.com/Index.do?mi=&ex=1&origin=MRL&rfr=655" rel="nofollow" target="_blank">
</td>
</tr>

This is only one row, there are hundreds more. So we have all rows with informations and we can check every single line and check whether it contains date, match, money etc ... i need to make a condition for each of them, to store all of them in an array.

I follow this tutorial: click here

Wich condition i can use to differentiate each cells from another?

I want to have something like this for each rows in the table:

[0] => Array
            (
                [date] => 18:30 19/4
                [teams] => CHELSEA FC - SUNDERLAND
                [1] => 1,21
                [1 volumes] => 92,8%
                [X] => 8,00
                [X volumes] => 4,7%
                [2] => 18,00
                [2 volumes] => 2,5%
                [matched] => 353.660 € 
                  ...

            )

This is the php, i'm blocked at this point:

<?php

$curl = curl_init('http://www.oxybet.ro/pariu/external/betfair-volumes.htm');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10');
$html = curl_exec($curl);
curl_close($curl);

if (!$html) {
     die("something's wrong!");
}



$dom = new DOMDocument();
@$dom->loadHTML($html);

$xpath = new DOMXPath($dom);

$scores = array();

$tableRows = $xpath->query('//div//div//div[2]//div/div//table//tr');

foreach ($tableRows as $row) {
    // fetch all 'tds' inside this 'tr'
    $td = $xpath->query('td', $row);
    $match = array();

It is quite unclear what you are asking. What condition to you need for each row? And in what way to you want to differentiate the cells? Maybe it would help if you give some sort of expected output you want to get. — dirkk
– dirkk, Commented Apr 19, 2014 at 11:30
Please don't put code into the comments, please edit your question instead. Any sort of code (like the array) in the comment section is terrible to read — dirkk
– dirkk, Commented Apr 19, 2014 at 11:37
Much better now, thanks :) Seems like the cells don't have any specific markup (like a certain class or id), so you most likely will have to distinguish them using a positional predicate in xpath. Something like td[1] will give you the first cell, td[2] the second and so on. — dirkk
– dirkk, Commented Apr 19, 2014 at 11:43
If you want to get a more concrete solution you should also include your PHP code as I am (and I am sure others as well) too lazy to code everything for you. Wouldn't mind adding the relevant part, though. — dirkk
– dirkk, Commented Apr 19, 2014 at 11:44

Jens Erat · Accepted Answer · 2014-04-19 16:06:24Z

1

Your query is fetching all table rows so far. In the next step, loop over these results (in PHP) and access the rows as needed. You might either want to use direct DOM access or XPath, whatever you prefer.

For using XPath, use an XPath expression that starts querying at the current context, and pass the current row as such. Use numerical predicates to limit to the row you're looking for. For example, to query the team name (in the third table cell, XPath counts 1-indexed), use something like

$tableRows = $xpath->query('//div//div//div[2]//div/div//table//tr');
foreach ($tableRows as $row) {
    $team = $xpath->query('./td[3]/a', $row)->item(0)->textContent;
}

Querying the class attributes might also be possible, but they seem to be used rather arbitrarily.

Now, read the other table rows with similar queries, construct the resulting map and append it to the $scores array.

answered Apr 19, 2014 at 16:06

Jens Erat

39k16 gold badges86 silver badges99 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Vignesh Kumar A Over a year ago

Can you please look into my problem and solve me stackoverflow.com/questions/23189178/…

Collectives™ on Stack Overflow

Parse HTML Table with DOM and XPath

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related