2

My PHP code is:

$string = preg_replace('/(href|src)="([^:"]*)(?:")/i','$1="http://mydomain.com/$2"', $string);

It work with:

 - <a href="aaa/">Link 1</a> => <a href="http://mydomain.com/aaa/">Link 1</a>
 - <a href="http://mydomain.com/bbb/">Link 1</a> => <a href="http://mydomain.com/bbb/">Link 1</a>

But not with:

- <a href='aaa/'>Link 1</a>
- <a href="#top">Link 1</a> (I don't want to change if url start by #).

Please help me!

3 Answers 3

2

How about:

$arr = array('<a href="aaa/">Link 1</a>',
             '<a href="http://mydomain.com/bbb/">Link 1</a>',
             "<a href='aaa/'>Link 1</a>",
             '<a href="#top">Link 1</a>');
foreach( $arr as $lnk) {
    $lnk = preg_replace('~(href|src)=(["\'])(?!#)(?!http://)([^\2]*)\2~i','$1="http://mydomain.com/$3"', $lnk);
    echo $lnk,"\n";
}

output:

<a href="http://mydomain.com/aaa/">Link 1</a>
<a href="http://mydomain.com/bbb/">Link 1</a>
<a href="http://mydomain.com/aaa/">Link 1</a>
<a href="#top">Link 1</a>

Explanation:

The regular expression:

(?-imsx:(href|src)=(["\'])(?!#)(?!http://)([^\2]*)\2)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    href                     'href'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    src                      'src'
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  =                        '='
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    ["\']                    any character of: '"', '\''
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    #                        '#'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    http://                  'http://'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  (                        group and capture to \3:
----------------------------------------------------------------------
    [^\2]*                   any character except: '\2' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \3
----------------------------------------------------------------------
  \2                       what was matched by capture \2
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
Sign up to request clarification or add additional context in comments.

4 Comments

I can't figure out how to make this work in a greedy fashion for a multi-line blob of HTML. I've tried the 'm' modifier but no luck. Can you help?
Thanks, the regex should be changed a bit to be ungreedy and to cover https too. '~(href|src)=(["\'])(?!#)(?!https?://)/?([^\2]*?)\2~i'
@Jako: You're right for https? but [^\2]* doesn't need to be ungreedy because it is ungreedy by itself.
Try your regex with two urls in one line: regex101.com/r/5Q8cye/1
0

This will work for you

PHP:

function expand_links($link) {
    return('href="http://example.com/'.trim($link, '\'"/\\').'"');
}

$textarea = preg_replace('/href\s*=\s*(?<href>"[^\\"]*"|\'[^\\\']*\')/e', 'expand_links("$1")', $textarea);

I also changed the regex to work with either double quotes or apostrophes

Comments

0

try this for your pattern

/(href|src)=['"]([^"']+)['"]/i

the replacement stays as is

EDIT:

wait one...i didn't test on the first 2 link types, just the ones that didn't work...give me a moment

REVISISED:

sorry about the first regex, i forgot about the second example that worked with the domain in it

(href|src)=['"](?:http://.+/)?([^"']+)['"]

that should work

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.