24

I'm trying to match all the images elements as strings,

This is my regex:

html.match(/<img[^>]+src="http([^">]+)/g);

This works, but I want to extract the src of all the images. So when I execute the regular expression on this String:

<img src="http://static2.ccn.com/ccs/2013/02/img_example.jpg />

it returns:

"http://static2.ccn.com/ccs/2013/02/img_example.jpg"

13
  • 5
    Don't use regex to parse html. Commented Feb 18, 2013 at 15:06
  • 4
    @Tomirammstein, why do you have to do it with a regex when Javascript has DOM built in? Commented Feb 18, 2013 at 15:09
  • 1
    I'm using node.js, so, I can't parse it into an HTML tree Commented Feb 18, 2013 at 15:15
  • 2
    @Tomirammstein Check this out: stackoverflow.com/questions/7977945/html-parser-on-nodejs Commented Feb 18, 2013 at 15:17
  • 2
    @Tomirammstein Don't you think it would've been helpful to tag this question as node.js in the first place? Commented Feb 18, 2013 at 15:20

6 Answers 6

32

You need to use a capture group () to extract the urls, and if you're wanting to match globally g, i.e. more than once, when using capture groups, you need to use exec in a loop (match ignores capture groups when matching globally).

For example

var m,
    urls = [], 
    str = '<img src="http://site.org/one.jpg />\n <img src="http://site.org/two.jpg />',
    rex = /<img[^>]+src="?([^"\s]+)"?\s*\/>/g;

while ( m = rex.exec( str ) ) {
    urls.push( m[1] );
}

console.log( urls ); 
// [ "http://site.org/one.jpg", "http://site.org/two.jpg" ]
Sign up to request clarification or add additional context in comments.

4 Comments

Ended up with this instead. Otherwise, it doesn't pick up all images. /<img[^>]+src="([^">]+)/g
some times img tag may have height or some other attr after "src" attr.So regex should be rex = /<img[^>]+src="?([^"\s]+)"?[^>]*\/>/g;
seems that this regex not works on all img tags, but this works /<img.*?src="([^">]*\/([^">]*?))".*?>/g;
this regx is not working incase we have entire html as a string and i want to find out the image url out of it. Can you help ? stackoverflow.com/questions/57883657/…
8
var myRegex = /<img[^>]+src="(http:\/\/[^">]+)"/g;
var test = '<img src="http://static2.ccn.com/ccs/2013/02/CC_1935770_challenge_accepted_pack_x3_indivisible.jpg" />';
myRegex.exec(test);

2 Comments

Thank you for your answer. It helped me. I just want to add this: var src = myRegex.exec(test); console.log('SRC: ' + src[1]);
this regx is not working incase we have entire html as a string and i want to find out the image url out of it. Can you help ? stackoverflow.com/questions/57883657/…
7

As Mathletics mentioned in a comment, there are other more straightforward ways to retrieve the src attribute from your <img> tags such as retrieving a reference to the DOM node via id, name, class, etc. and then just using your reference to extract the information you need. If you need to do this for all of your <img> elements, you can do something like this:

var imageTags = document.getElementsByTagName("img"); // Returns array of <img> DOM nodes
var sources = [];
for (var i in imageTags) {
   var src = imageTags[i].src;
   sources.push(src);
}

However, if you have some restriction forcing you to use regex, then the other answers provided will work just fine.

Comments

2

Perhaps this is what you are looking for:

What I did is slightly modified your regex then used the exec function to get array of matched strings. if you have more then 1 match the other matches will be on results[2], results[3]...

var html = '<img src="http://static2.ccn.com/ccs/2013/02/CC_1935770_challenge_accepted_pack_x3_indivisible.jpg" />';

var re = /<img[^>]+src="http:\/\/([^">]+)/g
var results = re.exec(html);

var source = results[1];
alert(source);

Comments

1

You can use an html parser and avoid regexp at all.

var parser = require('node-html-parser');

var html = '<img src="http://static2.ccn.com/ccs/2013/02/CC_1935770_challenge_accepted_pack_x3_indivisible.jpg" />'

parser.parse(html).querySelector('img').getAttribute('src')

=> 'http://static2.ccn.com/ccs/2013/02/CC_1935770_challenge_accepted_pack_x3_indivisible.jpg'

1 Comment

Please provide additional details in your answer. As it's currently written, it's hard to understand your solution.
-1

You can access the src value using groups

                                                   |->captured in group 1
                                   ----------------------------------                
var yourRegex=/<img[^>]+src\s*=\s*"(http://static2.ccn.com/ccs[^">]+)/g;
var match = yourRegex.exec(yourString);
alert(match[1]);//src value

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.