1

I have this kind of line: <a href="/verona/4mktg-for-marketing.8526695" title="4MKTG FOR MARKETING SRL">4MKTG FOR MARKETING <strong>SRL</strong> </a>

I need the field's title. I splitted the string by 'title="' then checked if it matches with this regex: "[0-9A-Z /.]{3,}" . But it doesnt work...

The field contains only digits, capital letters, spaces and dots

Thank you

Davide

1
  • 1
    do you have to use a regex? If not, can just find where "title=\"" starts and take substring from where title" ends to next double quote. Commented Nov 22, 2015 at 0:18

3 Answers 3

3

Instead of using a regular expression, you should use JSoup when dealing with HTML.

Document doc = Jsoup.parse(html);
Element links = doc.select("a");
for (Element l : links) {
    // grab the title attribute value
    System.out.println(l.attr("title"));
}
Sign up to request clarification or add additional context in comments.

1 Comment

agreed, anytime when parsing a known format (html, xml, json) don't use RegEx
2
title="([\dA-Z\. ]+)"

Regular expression visualization

Debuggex Demo

Comments

2

If you need to do it with regex (and using java.util.regex, see this answer considering PERL-like regexes in Java):

str = '<a href="/verona/4mktg-for-marketing.8526695" title="4MKTG FOR MARKETING SRL">4MKTG FOR MARKETING <strong>SRL</strong> </a>';
str = str.replaceAll('.* title="([\s\.A-Z0-9]+)".*', "$1");

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.