1

Simple_HTML_Dom is great for grabbing stuff within specific tags, but I'm not sure how to do much of anything beyond the basics when it comes to grabbing text. This is an example of what the code I am scraping from looks like:

<span>
Some code stuff.
</span>
FirstWord: 88
<span>
More code stuff.
</span>

As you can see, FirstWord and 88 are not enclosed in any sort of tag. This makes them hard to grab. Here's the rub, though: FirstWord will always be the same -- only the number changes.

So, my idea is to simply tell Simple_HTML_Dom to grab the numbers that immediately follow FirstWord. Problem is that I have no clue how to do this.

Any help is greatly appreciated.

1
  • Can you use regex? If so, getting "FirstWord" would be pretty easy. /FirstWord:\s[0-9]+/ Commented Feb 26, 2013 at 22:42

2 Answers 2

1
preg_match_all('/FirstWord:\s?([0-9]+)/', $input, $matches);
print_r($matches);
Sign up to request clarification or add additional context in comments.

1 Comment

This is correct, but there's only one, so just preg_match. Also \s* is better than \s? and \d instead of [0-9]
0

You can use process of elimination, assuming your html looks something like this..

<html>
    <head></head>
    <body>
        <span>Some code stuff.</span>
        FirstWord: 88
        <span>More code stuff.</span>
    </body>
</html>

You could just loop through all of the children elements (which in this case will be the <span> elements), and set their html to an empty string. This will leave you will only 'FirstWord: 88' remaining.

foreach($html->find('body', 0)->children() as $child){
    $child->outertext = "";
}

echo $html;
// Output:
// FirstWord: 88

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.