0

So,

I've got some legacy HTML I'm trying to use regex to remove the cruft from. Something like

<div class="al-list-head"><span>Another List</span></p>
<h3>Destinations</h3>
</div>

Another variant in HTML could be

<div class="al-list-head">
<p><span>Another List</span></p>
<h3>Lounge</h3>
</div>

(The CMS adds in redundant <p>'s sometimes).

My regex works for the most part (second sample) but not the first. I've tried a bunch of character classes, but can't seem to match the gap between the last </h3> and the final </div> in the the first sample.

My regex is...

$html = preg_replace( '/<div class=\"al-list-head\">[\s](<p>?)(<span>Another\ List<\/span>)(<\/p>?)[\s]<h3>([^<\/>]*)<\/h3>[\s]<\/div>/is', '<h3 class="al-head">$4</h3>', $html );

After the <\h3> I've tried [\s], ([\s]?), ([\s\b\n\r]*) and even (.*) with no luck.

Any pointers?

I'm using this handy little tool to iterate and test, hopefully someone finds it useful too.

3
  • 1
    Try adding the m modifier: your regex ends with /is, try /ism. Commented Jul 17, 2014 at 8:29
  • @dangvy what's your expected output? Commented Jul 17, 2014 at 8:32
  • your command works for me regex101.com/r/wN4wB7/2 Commented Jul 17, 2014 at 8:41

2 Answers 2

1

Use \s*

$html = preg_replace( '/<div class=\"al-list-head\">\s*(<p>?)(<span>Another\ List<\/span>)(<\/p>?)\s*<h3>([^<\/>]*)<\/h3>\s*<\/div>/is', '<h3 class="al-head">$4</h3>', $html );
Sign up to request clarification or add additional context in comments.

Comments

0

You could try the below regex, it could works for both example,

/<div\s*class=\"al-list-head\">\s*(<p>)?(<span>Another\s*List<\/span>)(<\/p>)?\s*<h3>([^<\/>]*)<\/h3>\s<\/div>/img

Replacement string:

<h3 class="al-head">$4</h3>

DEMO

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.