Getting text that follows specific text using Simple_HTML_Dom

Question

Simple_HTML_Dom is great for grabbing stuff within specific tags, but I'm not sure how to do much of anything beyond the basics when it comes to grabbing text. This is an example of what the code I am scraping from looks like:

<span>
Some code stuff.
</span>
FirstWord: 88
<span>
More code stuff.
</span>

As you can see, FirstWord and 88 are not enclosed in any sort of tag. This makes them hard to grab. Here's the rub, though: FirstWord will always be the same -- only the number changes.

So, my idea is to simply tell Simple_HTML_Dom to grab the numbers that immediately follow FirstWord. Problem is that I have no clue how to do this.

Any help is greatly appreciated.

Can you use regex? If so, getting "FirstWord" would be pretty easy. /FirstWord:\s[0-9]+/ — icanc
– icanc, Commented Feb 26, 2013 at 22:42

Sammitch · Accepted Answer · 2013-02-26 23:06:00Z

1

preg_match_all('/FirstWord:\s?([0-9]+)/', $input, $matches);
print_r($matches);

answered Feb 26, 2013 at 23:06

Sammitch

32.5k7 gold badges58 silver badges93 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

pguardiario Over a year ago

This is correct, but there's only one, so just preg_match. Also \s* is better than \s? and \d instead of [0-9]

r-sal · Accepted Answer · 2013-02-28 04:34:29Z

0

You can use process of elimination, assuming your html looks something like this..

<html>
    <head></head>
    <body>
        <span>Some code stuff.</span>
        FirstWord: 88
        <span>More code stuff.</span>
    </body>
</html>

You could just loop through all of the children elements (which in this case will be the <span> elements), and set their html to an empty string. This will leave you will only 'FirstWord: 88' remaining.

foreach($html->find('body', 0)->children() as $child){
    $child->outertext = "";
}

echo $html;
// Output:
// FirstWord: 88

answered Feb 28, 2013 at 4:34

r-sal

1,1698 silver badges9 bronze badges

Collectives™ on Stack Overflow

Getting text that follows specific text using Simple_HTML_Dom

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related