Java regular expression to get the img src

Question

I am trying to fetch the data from the html page. This data is image link. Page has always different content so only way is to use regular expression. There is only one match on the page with the following style

<img src="imglink" alt="texttext textex" style="border:1px solid #FFFFFF"/>

What am I using to get the imglink

"<img src=\"(.*)\""

Is there something I don't know about using regular expression? I must be easy as pie, but it get me all the text after < and before />

Why should I? I don't want to use another library just for this simple job. — artouiros
– artouiros, Commented Sep 17, 2011 at 17:05

Howard · Accepted Answer · 2011-09-17 17:05:58Z

3

Try to use the non-greedy version

"<img src=\"(.*?)\""

in order to match as few characters as possible.

Please note: do only use regular expressions to handle html or xml if you have a simple structure of the text which is known. For arbitrary htlm/xml do not use regex.

answered Sep 17, 2011 at 17:05

Howard

39.3k9 gold badges68 silver badges85 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Kashyap · Accepted Answer · 2011-09-17 17:28:12Z

2

As a rule of thumb when trying to select chars between to separators I make it a point to put "next expected separator char" in the selection clause instead of ".".

So in this case:

"<img src=\"([^\"]*)\""

edited Sep 17, 2011 at 17:28

answered Sep 17, 2011 at 17:14

Kashyap

17.9k14 gold badges78 silver badges126 bronze badges

Collectives™ on Stack Overflow

Java regular expression to get the img src

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related