0

I have a page source and i want to get the anchor text of all its anchor tags

Could someone please help me out with the pattern for it.

Thanks in Advance

4
  • 1
    In general, you should not use a regex pattern to extract markup, but rather an XML/DOM parser. Commented Sep 23, 2010 at 9:31
  • @karim79: Only possible if the HTML is well formed, isn't it? Commented Sep 23, 2010 at 9:36
  • @Tim - if it is generated in a 'regular', consistent way then I will agree. But still, regular expressions are not a good tool for parsing irregular, complex beasts of languages like HTML. Commented Sep 23, 2010 at 9:40
  • @karim79: You are right, that is rarely the right way, but we don't know the exact context. Maybe he gets only some little portion of HTML from some kind of source other than a full page. However, I posted an answer, maybe it helps. Commented Sep 23, 2010 at 9:45

2 Answers 2

2

karim79 is right, regex might be the wrong way, but anyway here is one simple way it could be done in Java. Note that this would not work, if the anchors have aditional attributes before the href. However, this might be a good start or help you understanding how you could do it.

    String html = "<body>" +
            "<a href=\"#first\">got to first</a>" +
            "<span>something else</span>" +
            "<a href=\"#second\">got to second</a>" +
            "</body>";

    Pattern pattern = Pattern.compile("<a href=\"#(\\w+)\">([\\w\\s]+)</a>");
    Matcher matcher = pattern.matcher(html);
    while(matcher.find()){
        System.out.println(matcher.group(2));
    }
Sign up to request clarification or add additional context in comments.

3 Comments

but wht i want is the Anchor text and not the url
which in this case is got to first & got to second
@Jack: I added a second group that gets the a-tag's content.
0

Try this regex pattern, should give you what you are looking for:

(?<=<\s*a[^>]*>)(?<anchorContent>[\s\S]*?)(?=<\s*/a>)

This will give you a group called "anchorContent"

Hope that helps.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.