20

Hi have the following content within an html page that stretches multiple lines

<div class="c-fc c-bc" id="content">
                <span class="content-heading c-hc">Heading 1 </span><br />
                The Home Page must provide a introduction to the services provided.<br />
                <br />
                <span class="c-sc">Sub Heading</span><br />
                The Home Page must provide a introduction to the services provided.<br />
                <br />
                <span class="c-sc">Sub Heading</span><br /> 
                The Home Page must provide a introduction to the services provided.<br />
            </div>

I need to replace everthing between <div class="c-fc c-bc" id="content"> and </div> with custom text

I use the following code to accomplish this but it does not want to work if it's multiple lines, but works if evertinh is in one line

$body = file_get_contents('../../templates/'.$val['url']);

$body = preg_replace('/<div class=\"c\-fc c\-bc\" id=\"content\">(.*)<\/div>/','<div class="c-fc c-bc" id="content">abc</div>',$body);

Am I missing something?

2
  • 5
    Don't use regex to parse HTML. Commented Jan 20, 2010 at 12:53
  • 4
    For multiline/dotall regex you'll also want to be very careful with that greedy .*, which will cause the very last </div> end-tag on the entire page to be matched. Maybe you want the first </div> end-tag in which case you'd need a non-greedy .*?. If you want the matching </div> end-tag, there is no way regex can work that out. Did we mention, don't use regex to parse HTML? Commented Jan 20, 2010 at 14:22

4 Answers 4

42

If this weren't HTML, I'd tell you to use the DOTALL modifier to change the meaning of . from 'match everything except new line' to 'match everything':

preg_replace('/(.*)<\/div>/s','abc',$body);

But this is HTML, so use an HTML parser instead.

Sign up to request clarification or add additional context in comments.

1 Comment

works great!. I was trying by replacing new lines with trim(preg_replace('/\s\s+/', ' ', $input)) . It was working with PHP test run online but was not working in my code.
24

it is the "s" flag, it enables . to capture newlines

1 Comment

Documentation for PHP: s (PCRE_DOTALL) If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.
1

you can also use [\s\S] instead of . combined with the DOTALL flag s for matching everyting because [\s\S] means exactly the same: match everything; \s matches all space-characters (including newline) and \S machtes everything that is not a space-character (i.e. everything else). in some cases/implementations of regular expressions, this works better than enabling DOTALL

caution: .* with the flag for DOTALL as well as [\s\S] are both "hungry" and won't stop reading the string. if you want them to stop at a certain position, (e.g. the first </div>), use the non-greedy operator ? behind your quantifier, e.g. .*?

Comments

0

It is possible to use regex to strip out chunks of html data, but you need to wrap the html with custom html tags which get ignored by browsers. For example:

<?php
$html='
<div>This will be shown</div>
<custom650 rel="nofollow">
  <p class="subformedit">
    <a href="#" class="mylink">Link</a>
    <div class="morestuff">
      ... more html in here ...
    </div>
  </p>
</custom650>
<div>This will also be shown</div>
';

To strip the tags with the rel="nofollow" attributes, you can use the following regex:

$newhtml = preg_replace('/<([^\s]+)[^>]*rel="nofollow"[^>]*>.*?<\/\1>/si', '', $html);

From experience, start the custom tags on a new line. Undoubtedly a hack, but might help someone.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.