1

I'm trying to parse a large HTML file using "PHP Simple HTML DOM Parser". The code is some thing like that:

<?php
    include('/lib/simplehtmldom/simple_html_dom.php');

    $data_url = "data/data.html";

    $date_html = file_get_html($data_url);
    foreach($date_html->find('li a') as $element){
        $data = $element->href;
        echo $data;
    }
?>

Size of "data.html" is about 3MB. Execution of code results in "Fatal error: Call to a member function find() on a non-object in C:\xampp\htdocs\parser\index.php on line 7.

What am i suppose to do?

8
  • 1
    file_get_html() isn't returning what you think it is. Use var_dump($date_html); to see what you're getting. Commented Jan 4, 2018 at 15:56
  • 'What am i suppose to do?' - you could try DOMDocument and loadhtml() Commented Jan 4, 2018 at 16:02
  • You must have an error and $date_url is null (also should it be datA?) Commented Jan 4, 2018 at 16:02
  • file_get_html() is a function in "PHP Simple HTML DOM" library. Actually i think that the error is with find() function. Commented Jan 4, 2018 at 16:06
  • "i think that the error is with find() function" No. PHP is clearly telling you that you're trying to invoke a method on a variable that does not contain an object. Commented Jan 4, 2018 at 16:08

2 Answers 2

5

MAX_FILE_SIZE defined in simple_html_dom to be 600KB.

you can edit this code: define('MAX_FILE_SIZE', 600000); on simple_html_dom.php file.

worked for me

Sign up to request clarification or add additional context in comments.

1 Comment

I changed MAX_FILE_SIZE to define('MAX_FILE_SIZE', 6000000); and it worked. thank you :)
0

file_get_html fails when attempting to read the file data/data.html

In this case the returned value that you store in $date_html is not an object. Attenpting later to call the method find raises the error you get.


1.

As you're trying to parse a large file you need to increase the maximum file size allowed by simplehtmldom that is set by default at 600.000 bytes (so not enought).

You can do this with define( 'MAX_FILE_SIZE', 4000000 ); // Max file size 4MB appox.

before you include the library:

define( 'MAX_FILE_SIZE', 4000000 ); // Max file size 4MB appox.
include('/lib/simplehtmldom/simple_html_dom.php');

2.

If adjusting MAX_FILE_SIZE doesn't solve the problem then ensure data/data.html is the correct relative path to the file to be parsed.

If the file is not found the file_get_html will fail.

In this case you may try to pass an absolute path (a path that begins with /), ex:

/var/data/data.html

5 Comments

I'm trying to parse a local file. local parsing is possible according to "PHP Simple HTML DOM Parser Manual".
@AdelAmani Of course. Then data/data.html is not found. Pass an absolute path. I edited the answer
you mean: "localhost/parser/data/data.html"?
@AdelAmani you can safely define it in your code without modifying simplehtmldom. You just need to DEFINE it before you include the library. This way if you update the library (overwriting your edit) your code won't break
you'r right. actually i think overwriting should be after include('/lib/simplehtmldom/simple_html_dom.php');. am i wrong?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.