7

I am trying to extract the img and src from a long html string.

I know there are a lot of questions about how to do this, but I have tried and gotten the wrong result. My question is just about contradicting results though.

I am using:

var url = "<img height=\"100\" src=\"\" width=\"200\"></img>";
var regexp = /<img[^>]+src\s*=\s*['"]([^'"]+)['"][^>]*>/g;
var src = url.match(regexp);

But this results in src not being extracted properly. I keep getting src =<img height="100" src="" width="200"></img> instead of 

However, when I try this on the regex tester at regex101, it extracts the src correctly. What am I doing wrong? Is match() the wrong function to use>

1

4 Answers 4

23

If you need to get the whole img tags for some reason:

const imgTags = html.match(/<img [^>]*src="[^"]*"[^>]*>/gm);

then you can extract the source link for every img tag in array like this:

const sources = html.match(/<img [^>]*src="[^"]*"[^>]*>/gm)
                          .map(x => x.replace(/.*src="([^"]*)".*/, '$1'));
Sign up to request clarification or add additional context in comments.

6 Comments

@nbsp Glad it helped someone! :)
This is exactly what I was looking for! Cheers.
Perfect! You just forgot a closed bracket on the very last line
@Shadrix It's added. Thank you!
Exactly what I needed too
|
5

Not a big fan of using regex to parse html content, so here goes the longer way

var url = "<img height=\"100\" src=\"\" width=\"200\"></img>";
var tmp = document.createElement('div');
tmp.innerHTML = url;
var src = tmp.querySelector('img').getAttribute('src');
snippet.log(src)
<!-- Provides the `snippet` object, see http://meta.stackexchange.com/a/242144/134069 -->
<script src="http://tjcrowder.github.io/simple-snippets-console/snippet.js"></script>

1 Comment

OP, I gave you the literal answer to your question; but this here is what you would be advised to be doing instead.
1

Try this:

var match = regexp.exec(url);
var src = match[1];

2 Comments

Thanks, this works too. Just wondering, why does match[0] return the original string and match[1] return the substring that we are actually looking for? Is it always the case that the 2nd element in the resulting array will be the desired result?
@llams48: match[1] is the 1st capture group, match[2] is the second... and match[0] is the full match.
1
const src = url.slice(url.indexOf("src")).split('"')[1]

Regex gives me headaches. Boohoo.

Find the index of the src in the HTML string (named var url in the question), then slice it from there, and finally split the array from the " 's. The second item in the array is your src link.

1 Comment

This does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From Review

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.