how to retrieve text from anchor href attribute in python

Question

Let's say I have a link like this:

link = '<a href="some text">...</a>'

Is there any way I can retrieve the text from anchor href attribute so the result will be something like this:

hrefText = 'some text'

And thank you in advance

Jahid · Accepted Answer · 2016-06-30 18:56:11Z

1

This is a way:

import re
print re.search('(?<=<a href=")[^"]+',link).group(0)

Or,

print re.search(r'<a\s+href="([^"]+)',link).group(1)

answered Jun 30, 2016 at 18:56

Jahid

22.6k10 gold badges97 silver badges114 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Brian · Accepted Answer · 2016-06-30 18:58:42Z

1

Although you could split or use a regular expression, for a more modular and powerful tool set, you could use

BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/

Sample code:

from bs4 import BeautifulSoup 
link = '<a href="some text">...</a>'
soup = BeautifulSoup(link, "html.parser")
for anchor in soup.find_all('a', href=True):
    print anchor['href']

Alternatively, for a single function, you can do this:

from bs4 import BeautifulSoup 

def getHref( link ):
    soup = BeautifulSoup(link, "html.parser")
    return soup.find_all('a', href=True)[0]['href']

answered Jun 30, 2016 at 18:58

Brian

1,67512 silver badges17 bronze badges

2 Comments

Jahid Over a year ago

isn't it a bit overkill just to parse a single href link?

Brian Over a year ago

Although this is a smaller problem, many people reading this in the future may be trying to do a bit more scraping :)

Satyaki Sanyal · Accepted Answer · 2016-06-30 19:14:50Z

1

You can use bs4 and requests lib for this.

import requests
from bs4 import BeautifulSoup
url = 'https://examplesite.com/'
source = requests.get(url)
text = source.text
soup = BeautifulSoup(text, "html.parser")
for link in soup.findAll('a', {}):
   href = '' + link.get('href')
   title = link.string
   print("hrefText = ", href)

Hope this helps :)

answered Jun 30, 2016 at 19:14

Satyaki Sanyal

1,34911 silver badges13 bronze badges

Collectives™ on Stack Overflow

how to retrieve text from anchor href attribute in python

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related