PHP Simple HTML DOM Parser - FIle not being read

Question

I've written a script to process html files from URLs, however, due to a 30's script runtime restriction with my cheap host provider I've had to alter the script to store the html as txt files and run it from a local WAMP server.

I am trying to load each file up, extract what I need, then move onto the next file.

URL's as source file_get_html was doing the job perfectly (I could ->find the required elements) Txt file as source file_get_html is returning a blank object.

Based on some advice in the below post I changed file_get_html for file_get_contents which created an array with a single large string containing the contents of the text file.

First, make sure that file_get_contents can get data. If it can, file_get_html will be able to load data to simplehtml Dom

If file_get_contents returns a string, which it does, how would I "load data to simplehtml Dom?"

File not getting read using file_get_html

I then tried to convert the string into an object str_get_html, however, this didn't work either.

include('simple_html_dom.php');
$html = file_get_html('file.txt');
var_dump($html);

Returns: object(simple_html_dom)[1] but with no other contents or arrays.

include('simple_html_dom.php');
$html = file_get_contents('file.txt');
var_dump($html);

Returns: string < ! DOCTYPE html PUBLIC.....

Questions:

Can anyone give me any advice? What's the best way to load up a text file containing html markup into an object so that I can utilise the find method on it's contents. I want to avoid loading the file into an array of strings and using regex to process contents.

Are there any considerations I need to make if using a local WAMP server?

Can you post your code and text file you are trying to read ? — Navneet Singh
– Navneet Singh, Commented Nov 29, 2012 at 11:12
I managed to fix it using str_get_html after i'd used file_get_contents to open the file. The text file is literally html source code dump of a webpage e.g.<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="w3.org/1999/xhtml"> <head>....... — Jim
– Jim, Commented Nov 29, 2012 at 11:21

2 revs · Accepted Answer · 2017-03-20 09:16:22Z

1

(Answered by the OP in a question. Converted to a community wiki answer. See Question with no answers, but issue solved in the comments (or extended in chat) )

The OP wrote:

I managed to solve this myself. I am sure i'd already tried to extract html from string, doh!

include('simple_html_dom.php');
$html = file_get_contents('file.txt');    
$html = str_get_html($html);
var_dump($html)

Returns object(simple_html_dom)[1] including all expected arrays etc

Instead of trying to create the html object directly from the source file using file_get_html I've extracted the file contents file_get_contents then converted str to html using str_get_html which allows me to use the simple html dom methods e.g. find on attributes within the object e.g.

$html->find('a');

Collectives™ on Stack Overflow

PHP Simple HTML DOM Parser - FIle not being read

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related