0

I am trying to fetch the data from the html page. This data is image link. Page has always different content so only way is to use regular expression. There is only one match on the page with the following style

<img src="imglink" alt="texttext textex" style="border:1px solid #FFFFFF"/>

What am I using to get the imglink

"<img src=\"(.*)\""

Is there something I don't know about using regular expression? I must be easy as pie, but it get me all the text after < and before />

3
  • 1
    Why don't you use an HTML parser? Commented Sep 17, 2011 at 17:04
  • Why should I? I don't want to use another library just for this simple job. Commented Sep 17, 2011 at 17:05
  • "Why should I?": stackoverflow.com/questions/1732348/… Commented Sep 17, 2011 at 17:06

2 Answers 2

3

Try to use the non-greedy version

"<img src=\"(.*?)\""

in order to match as few characters as possible.

Please note: do only use regular expressions to handle html or xml if you have a simple structure of the text which is known. For arbitrary htlm/xml do not use regex.

Sign up to request clarification or add additional context in comments.

Comments

2

As a rule of thumb when trying to select chars between to separators I make it a point to put "next expected separator char" in the selection clause instead of ".".

So in this case:

"<img src=\"([^\"]*)\""

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.