2

< a href=" http://www.google.com " > Google < /a> < br/> //without the spaces

I'm trying to extract the link http://www.google.com as well as the text Google

3
  • 1
    Why are you trying to parse it yourself? There are many great libraries out there such as Jsoup that can take care of it for you. Commented Nov 21, 2013 at 1:15
  • @stevevls It's a requirement for an assignment. Commented Nov 21, 2013 at 1:28
  • Did your professor insist that you use regular expressions to parse this HTML? Commented Nov 21, 2013 at 1:36

3 Answers 3

1

This should do the job.

    String url = "<a href=\"http://www.google.com\">Google</a><br/>";
    String[] separate = url.split("\"");
    String URL = separate[1];
    String text = separate[2].substring(1).split("<")[0];
Sign up to request clarification or add additional context in comments.

Comments

0

You can extract it by using a simple regex. Try this.

String s = "<a href=\"http://www.google.com\">Google</a><br/>";
Pattern pattern = Pattern.compile("<a\\s+href=\"([^\"]*)\">([^<]*)</a>");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
    System.out.println(matcher.group(1));
    System.out.println(matcher.group(2));
}

Comments

0

I use the filter API in my web crawler, and it works perfectly.

Here is the API code:

public static String filterHref( String hrefLine )
{
    String link = hrefLine;
    if ( !link.toLowerCase().contains( "href" ) )
        return "";
    String[] hrefSplit = hrefLine.split( "href" ); // split href="..." alt="...">...<...>

    link = hrefSplit[ 1 ].split( "\\s+" )[ 0 ]; // get href attribute and value
    if ( link.contains( ">" ) )
        link = link.substring( 0, link.indexOf( ">" ) );
    link = link.replaceFirst( "=", "" );
    link = link.replace( "\"", "" ).replace( "'", "" ).trim();
    return link;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.