regex php: find everything in div

Question

I'm trying to find eveything inside a div using regexp. I'm aware that there probably is a smarter way to do this - but I've chosen regexp.

so currently my regexp pattern looks like this:

$gallery_pattern = '/<div class="gallery">([\s\S]*)<\/div>/';

And it does the trick - somewhat.

The problem is if i have two divs after each other - like this.

<div class="gallery">text to extract here</div>
<div class="gallery">text to extract from here as well</div>

I want to extract the information from both divs, but my problem, when testing, is that im not getting the text in between as a result but instead:

"text to extract here </div>  
<div class="gallery">text to extract from here as well"

So to sum up. It skips the first end of the div. and continues on to the next. The text inside the div can contain <, / and linebreaks. just so you know!

Does anyone have a simple solution to this problem? Im still a regexp novice.

I've been discussing the same w/ my friend few weeks ago. The problem is when you have tags like these "<div class="gallery">some text<div>other text</div></div>", it is hard to make the expression not stop on the first </div> — Filip Navara
– Filip Navara, Commented Aug 29, 2009 at 18:41

meder omuraliev · Accepted Answer · 2009-08-29 18:46:18Z

12

You shouldn't be using regex to parse HTML when there's a convenient DOM library:

$str = '
<div class="gallery">text to extract here</div>
<div class="gallery">text to extract from here as well</div>
';

$doc = new DOMDocument();
$doc->loadHTML($str);
$divs = $doc->getElementsByTagName('div');

if ( count($divs ) ) {
    foreach ( $divs as $div ) {
    echo $div->nodeValue . '<br>';
    }
}

answered Aug 29, 2009 at 18:46

meder omuraliev

187k76 gold badges402 silver badges443 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Wiktor Stribiżew · Accepted Answer · 2021-05-11 18:54:05Z

11

What about something like this :

$str = <<<HTML
<div class="gallery">text to extract here</div>
<div class="gallery">text to extract from here as well</div>
HTML;

$matches = array();
preg_match_all('#<div[^>]*>(.*?)</div>#s', $str, $matches);

var_dump($matches[1]);

Note the '?' in the regex, so it is "not greedy".

Which will get you :

array
  0 => string 'text to extract here' (length=20)
  1 => string 'text to extract from here as well' (length=33)

This should work fine... If you don't have imbricated divs ; if you do... Well... actually : are you really sure you want to use rational expressions to parse HTML, which is quite not that rational itself ?

edited May 11, 2021 at 18:54

Wiktor Stribiżew

631k41 gold badges502 silver badges633 bronze badges

answered Aug 29, 2009 at 18:41

Pascal MARTIN

402k82 gold badges665 silver badges666 bronze badges

1 Comment

Pascal MARTIN Over a year ago

@Filip : I would recommend using DOM and loadHTML too, actually -- I did several times, in other answers (see stackoverflow.com/questions/1274020/… for instance) : HTML is not something that can be properly parsed with regexes... not rational enough, I suppose ^^

Collectives™ on Stack Overflow

regex php: find everything in div

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related