3

I need to look inside a string of HTML and change all <img> tags where the src attribute is a relative address, to an absolute URL. So this:

<img src="puppies.jpg">

needs to become:

<img src="http://sitename.com/path/puppies.jpg">

while ignoring <img> tags whose src attribute is already absolute.

I'm using PHP and assume that I'll need to run this through preg_replace(). Help! And Thanks!

3
  • 3
    possible duplicate of Javascript: REGEX to change all relative Urls to Absolute Commented Apr 30, 2012 at 19:08
  • That's for JavaScript, but the principle is the same. Commented Apr 30, 2012 at 19:09
  • Consider using DomDocument class instead of preg for doing HTML stuff. Commented Apr 30, 2012 at 19:09

2 Answers 2

8

This is not a job for a regular expression. It's a job for an XML/DOM parser.

I'd give DOMDocument a shot.

$DOM = new DOMDocument;
$DOM->loadHTML($html);

$imgs = $DOM->getElementsByTagName('img');
foreach($imgs as $img){
    $src = $img->getAttribute('src');
    if(strpos($src, 'http://sitename.com/path/') !== 0){
        $img->setAttribute('src', "http://sitename.com/path/$src");
    }
}

$html = $DOM->saveHTML();
Sign up to request clarification or add additional context in comments.

4 Comments

I upvoted, but it also needs a check for src attributes which are already absolute, per the OP.
@Mathletics: Ah yes, didn't notice that, I can add that :-P
@Jack: Good idea, changed :-P
Yay! That does it! Question: the HTML that gets returned automatically gets a <doctype>, <html>, <body>, etc... tags. Is there any way to turn that off? All I want is what I gave it to start... just with the find-and-replace part done. Does that make sense?
0

This is not a job for a regular expression. It's a job for an XML/DOM parser.

Nope it's not. If you just want to add a prefix to each src attribute, it's best to use simple string functions and don't even think about xml, regex or dom parsing…

$str = str_replace('<img src="', '<img src="http://prefix', $str);

You can clean up wrong links (already absolute ones) afterwards

$str = str_replace('<img src="http://prefixhttp://', '<img src="http://', $str);

Do not blow up your code with regexp/dom if you can avoid it.

4 Comments

What would happen happen if my HTML was <img class='animals' src='puppies.jpg' />?
@Rocket sorry, but he said his HTML is <img src="puppies.jpg">
wrong links can easily be fixed: str_replace('prefixprefix', 'prefix', $str) str_replace('http://prefix/http://', 'http://', $str) don't blow up your code with regex/dom if you do not have to…
@semu, I'm totally with you on this one. For this instance (and mine) there's no reason to add all kinds of unnecessary overhead when a simple solution exist... albeit not the most graceful, it definitely gets the job done.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.