0

I need to parse strings of html content and where possible replace urls to images on other domains with https wherever they are http. The issue is that not all the external domains support https so I can't blanket replace http for https.

So I want to do this with a list of domains I know work with https.

There is the small added complication that the search has to work for domains irrelevant if www. is added or not.

Using the example given by @Wiktor I have something close to what I want, but this needs reversing to run the replace when a match is found, not when a match isn't found as this code currently functions.

/http(?!:\/\/(?:[^\/]+\.)?(?:example\.com|main\.com)\b)/i
7
  • 3
    Asking about Regular Expressions, no sample input, no desired output. Something is not right... Commented Jul 11, 2016 at 12:40
  • Don't use a regex for this. You have too many requirements to expect regex to be able to handle this properly. Commented Jul 11, 2016 at 12:41
  • See ideone.com/a6Eb2r Commented Jul 11, 2016 at 12:45
  • @WiktorStribiżew I am using your example but the matches need to occur when the string does match the domain names, not when it doesn´t match them. Commented Jul 15, 2016 at 16:58
  • I am feeding my daughter, let me finish. Maybe $re = '/http(?=:\/\/(?:[^\/]+\.)?(?:' . implode("|", array_map(function ($x) {return preg_quote($x); }, $domains)) . ')\b)/i'; echo preg_replace($re, "https", $s);? Commented Jul 15, 2016 at 17:02

1 Answer 1

1

I believe you can use

$domains = array("example.com", "main.com");
$s = "http://example.com http://main.main.com http://let.com";
$re = '/http(?=:\/\/(?:[^\/]+\.)?(?:' 
      . implode("|", array_map(function ($x) {
             return preg_quote($x); 
          }, $domains)) 
      . ')\b)/i'; 
echo preg_replace($re, "https", $s);
// => https://example.com https://main.main.com http://let.com

See the IDEONE demo

The regex matches:

  • http - http only if followed by...
  • (?= - start of positive lookahead
    • :\/\/ - a :// literal substring
    • (?:[^\/]+\.)? - an optional sequence of 1+ chars other than / and a .
    • (?: + implode code - creates an alternation group escaping individual literal branches (to match any one of the alternatives, example or main, etc.)
    • ) - end of the alternation group
  • \b - word boundary
  • ) - end of the lookahead
  • /i - case insenstive modifier.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.