0

I'm trying to extract all the hrefs and srcs in a string like this :

$content = "
At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium
voluptatum deleniti Image: <img src = 'http://example.com/check-3.png' /> Link: <a href ='http://example.com/test.xls'>test.xls</a>";

Basically what I want to do is change example.com to a to a different domain name (say test.com) and then extract all the filenames from hrefs and srcs. I was able to do the domain name replacement with a simple str_replace but now I'm stuck trying to extract the hrefs and srcs.

Here's what I tried using :

$regex = "/src=[\"' ]?([^\"' >]+)[\"' ]?[^>]*>.*?href=[\"' ]?([^\"' >]+)[\"' ]?[^>]*>/i";

This seems to work if there is no space between src (or href) and the = (e.g. ) but if there is space (e.g. ) it does not work. I've tried adding the space character but that fails the preg match. I don't want to use a heavy library like simple HTML dom, besides i don't think it will work as its not a proper HTML document. It's a string coming out of ckeditor.

1

1 Answer 1

1

Why not just add quantifiers on the space?

$regex = "/src *= *[\"' ]?([^\"' >]+)[\"' ]?[^>]*>.*?href=[\"' ]?([^\"' >]+)[\"' ]?[^>]*>/i";
               ^  ^
Sign up to request clarification or add additional context in comments.

2 Comments

Why is there a space after the = ? Shouldn't it be /src*=* meaning any number of spaces betfore and after = ?
The * modifies the previous character. src *= * means: "'src 'followed by any amount of spaces. followed by '=' followed by any amount of spaces.". src*=* means: "'sr' followed by any number of 'c's followed by any number of '='s".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.