preg_replace to change url from relative to absolute

Question

My PHP code is:

$string = preg_replace('/(href|src)="([^:"]*)(?:")/i','$1="http://mydomain.com/$2"', $string);

It work with:

 - <a href="aaa/">Link 1</a> => <a href="http://mydomain.com/aaa/">Link 1</a>
 - <a href="http://mydomain.com/bbb/">Link 1</a> => <a href="http://mydomain.com/bbb/">Link 1</a>

But not with:

- <a href='aaa/'>Link 1</a>
- <a href="#top">Link 1</a> (I don't want to change if url start by #).

Please help me!

Toto · Accepted Answer · 2013-10-15 07:51:22Z

2

How about:

$arr = array('<a href="aaa/">Link 1</a>',
             '<a href="http://mydomain.com/bbb/">Link 1</a>',
             "<a href='aaa/'>Link 1</a>",
             '<a href="#top">Link 1</a>');
foreach( $arr as $lnk) {
    $lnk = preg_replace('~(href|src)=(["\'])(?!#)(?!http://)([^\2]*)\2~i','$1="http://mydomain.com/$3"', $lnk);
    echo $lnk,"\n";
}

output:

<a href="http://mydomain.com/aaa/">Link 1</a>
<a href="http://mydomain.com/bbb/">Link 1</a>
<a href="http://mydomain.com/aaa/">Link 1</a>
<a href="#top">Link 1</a>

Explanation:

The regular expression:

(?-imsx:(href|src)=(["\'])(?!#)(?!http://)([^\2]*)\2)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    href                     'href'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    src                      'src'
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  =                        '='
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    ["\']                    any character of: '"', '\''
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    #                        '#'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    http://                  'http://'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  (                        group and capture to \3:
----------------------------------------------------------------------
    [^\2]*                   any character except: '\2' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \3
----------------------------------------------------------------------
  \2                       what was matched by capture \2
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

answered Oct 15, 2013 at 7:51

Toto

91.7k63 gold badges97 silver badges135 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Adam Friedman Over a year ago

I can't figure out how to make this work in a greedy fashion for a multi-line blob of HTML. I've tried the 'm' modifier but no luck. Can you help?

Jako Over a year ago

Thanks, the regex should be changed a bit to be ungreedy and to cover https too. '~(href|src)=(["\'])(?!#)(?!https?://)/?([^\2]*?)\2~i'

Toto Over a year ago

@Jako: You're right for https? but [^\2]* doesn't need to be ungreedy because it is ungreedy by itself.

Jako Over a year ago

Try your regex with two urls in one line: regex101.com/r/5Q8cye/1

Lashawn Little · Accepted Answer · 2013-10-15 03:58:47Z

0

This will work for you

PHP:

function expand_links($link) {
    return('href="http://example.com/'.trim($link, '\'"/\\').'"');
}

$textarea = preg_replace('/href\s*=\s*(?<href>"[^\\"]*"|\'[^\\\']*\')/e', 'expand_links("$1")', $textarea);

I also changed the regex to work with either double quotes or apostrophes

answered Oct 15, 2013 at 3:58

Lashawn Little

52 bronze badges

Comments

gwillie · Accepted Answer · 2013-10-15 05:35:44Z

0

try this for your pattern

/(href|src)=['"]([^"']+)['"]/i

the replacement stays as is

EDIT:

wait one...i didn't test on the first 2 link types, just the ones that didn't work...give me a moment

REVISISED:

sorry about the first regex, i forgot about the second example that worked with the domain in it

(href|src)=['"](?:http://.+/)?([^"']+)['"]

that should work

edited Oct 15, 2013 at 5:35

answered Oct 15, 2013 at 5:22

gwillie

1,8991 gold badge12 silver badges14 bronze badges

Collectives™ on Stack Overflow

preg_replace to change url from relative to absolute

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related