0

I am trying to parse html table in order to get <td> ID HERE </td> tag content using Xpath and PHP. Executing following line $doc->loadHTMLFile($file); gives me warnings like this:

PHP Warning: DOMDocument::loadHTMLFile(): Unexpected end tag : tr in...

That's why I am using the following block of code:

libxml_use_internal_errors(true); $doc->loadHTMLFile($file); libxml_clear_errors();

Trying to parse this: (the entire page here)

<table class="object-table" cellpadding="0" cellspacing="0">
  <tbody>
    <tr>
      <th width="8%">something here</th>
      <th width="89%">something here</th>
      <th width="3%">something here</th>
    </tr>
    <tr class="normal-row">
      <td>ID number here</td>
      <td><a href="/catalog/view/id/4127">something here</a>
      </td>
      <td align="center">
        <img src="/design/img/hasnt_photo_icon.gif">
      </td>
    </tr>
    <tr class="odd-row">
      <td>ID number here</td>
      <td><a href="/catalog/view/id/1865">something here</a>
      </td>
      <td align="center">
        <img src="/design/img/hasnt_photo_icon.gif">
      </td>
    </tr>
    </tbody>
</table>

with the following code:

$file = "http://www.sportsporudy.gov.ua/catalog/#c[1]=1";
$doc = new DOMDocument();

libxml_use_internal_errors(true);
$doc->loadHTMLFile($file);
libxml_clear_errors();

$xpath = new DOMXPath($doc);
$query = '//tr[@class="odd-row"]';


$elements = $xpath->query($query);
printf("Size of array: %d\n", sizeof($elements));
printElements($elements);

and tried using different queries like //table[@class="object-table"]/tbody/tr ... but doesn't seem to give me the td tags I need. Maybe that's because of the broken HTML.

Thanks for your advice.

2
  • this code should give you access to the first td in the table (the one that contains the id) /table//td[1]. I just have one question, are you able to get the html at all? You might be getting blocked by the .robots.txt Commented Feb 22, 2016 at 12:36
  • @PedroBernardo , yes, that's right. The HTML is loaded and I could even get the first <tr> block, but not the other tags. Commented Feb 22, 2016 at 12:39

1 Answer 1

0

Substantially, your code is fine.

The only error that I've found is in the printing $elements length: $elements is not an array, to retrieve its length you have to use this syntax:

printf( "Size of array: %d\n", $elements->length );

But the major problem that you have with your page is that the HTML has only one table with one row: the remaining data are filled with javascript, so you can't retrieve it directly through DOMXPath.

Sign up to request clarification or add additional context in comments.

5 Comments

thanks fusion3k, what is the possible way to get all that td tags? Should I use jquery instead?
Maybe, but you can't do this in the same php code, you have to do it in rendered page. Otherwise, you have to see at html source code trying to find how the content is loaded, but it can be a hard job.
Try to look at this answer, maybe can help you (I don't have tried it)...
What is your goal? This app if for your personal use? If it is, how they work? By your specific request? I can not express myself well, but if you go to origina url in Chrome and then you send request to your server, maybe I can have a solution
Practically: in Chrome you load url sportsporudy.gov.ua/catalog/#c[1]=1, then send request to your php page through a Chrome bookmarklet

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.