2

I've written a script to process html files from URLs, however, due to a 30's script runtime restriction with my cheap host provider I've had to alter the script to store the html as txt files and run it from a local WAMP server.

I am trying to load each file up, extract what I need, then move onto the next file.

URL's as source file_get_html was doing the job perfectly (I could ->find the required elements) Txt file as source file_get_html is returning a blank object.

Based on some advice in the below post I changed file_get_html for file_get_contents which created an array with a single large string containing the contents of the text file.

First, make sure that file_get_contents can get data. If it can, file_get_html will be able to load data to simplehtml Dom

If file_get_contents returns a string, which it does, how would I "load data to simplehtml Dom?"

File not getting read using file_get_html

I then tried to convert the string into an object str_get_html, however, this didn't work either.

include('simple_html_dom.php');
$html = file_get_html('file.txt');
var_dump($html);

Returns: object(simple_html_dom)[1] but with no other contents or arrays.

include('simple_html_dom.php');
$html = file_get_contents('file.txt');
var_dump($html);

Returns: string < ! DOCTYPE html PUBLIC.....

Questions:

Can anyone give me any advice? What's the best way to load up a text file containing html markup into an object so that I can utilise the find method on it's contents. I want to avoid loading the file into an array of strings and using regex to process contents.

Are there any considerations I need to make if using a local WAMP server?

2
  • Can you post your code and text file you are trying to read ? Commented Nov 29, 2012 at 11:12
  • I managed to fix it using str_get_html after i'd used file_get_contents to open the file. The text file is literally html source code dump of a webpage e.g.<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="w3.org/1999/xhtml"> <head>....... Commented Nov 29, 2012 at 11:21

1 Answer 1

1

(Answered by the OP in a question. Converted to a community wiki answer. See Question with no answers, but issue solved in the comments (or extended in chat) )

The OP wrote:

I managed to solve this myself. I am sure i'd already tried to extract html from string, doh!

include('simple_html_dom.php');
$html = file_get_contents('file.txt');    
$html = str_get_html($html);
var_dump($html)

Returns object(simple_html_dom)[1] including all expected arrays etc

Instead of trying to create the html object directly from the source file using file_get_html I've extracted the file contents file_get_contents then converted str to html using str_get_html which allows me to use the simple html dom methods e.g. find on attributes within the object e.g.

$html->find('a');
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.