1

Let's say I have a link like this:

link = '<a href="some text">...</a>'

Is there any way I can retrieve the text from anchor href attribute so the result will be something like this:

hrefText = 'some text'

And thank you in advance

3 Answers 3

1

This is a way:

import re
print re.search('(?<=<a href=")[^"]+',link).group(0)

Or,

print re.search(r'<a\s+href="([^"]+)',link).group(1)
Sign up to request clarification or add additional context in comments.

Comments

1

Although you could split or use a regular expression, for a more modular and powerful tool set, you could use

BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/

Sample code:

from bs4 import BeautifulSoup 
link = '<a href="some text">...</a>'
soup = BeautifulSoup(link, "html.parser")
for anchor in soup.find_all('a', href=True):
    print anchor['href']

Alternatively, for a single function, you can do this:

from bs4 import BeautifulSoup 

def getHref( link ):
    soup = BeautifulSoup(link, "html.parser")
    return soup.find_all('a', href=True)[0]['href']

2 Comments

isn't it a bit overkill just to parse a single href link?
Although this is a smaller problem, many people reading this in the future may be trying to do a bit more scraping :)
1

You can use bs4 and requests lib for this.

import requests
from bs4 import BeautifulSoup
url = 'https://examplesite.com/'
source = requests.get(url)
text = source.text
soup = BeautifulSoup(text, "html.parser")
for link in soup.findAll('a', {}):
   href = '' + link.get('href')
   title = link.string
   print("hrefText = ", href)

Hope this helps :)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.