Python Extract href Problems

Question

I'm trying to get all a href from a url. The problem is I can't extract the write a href:

<a href="#!DetalleNorma/203906/20190322" title="" data-bind="html: organismo, attr: {href: $root.crearHrefDetalleNorma(idTamite,fechaPublicacion)} ">SECRETARÍA GENERAL</a>

All I can extract is: #!

from bs4 import BeautifulSoup
import urllib.request as urllib2
import re

html_page = urllib2.urlopen('https://www.boletinoficial.gob.ar/')
soup = BeautifulSoup(html_page)
for link in soup.findAll('a'):
    print link.get('href')

Here is with the parse. It is not working too:

import requests
from lxml import html
from bs4 import BeautifulSoup

r = requests.get('https://www.boletinoficial.gob.ar/')
soup = BeautifulSoup(r.content, "html.parser")

for td in soup.findAll("div", class_="itemsection"):
    for a in td.findAll("a", href=True):
        print(a.text)

@JonClements Would throw UserWarning: No parser was explicitly specified, in Python 3.x — DirtyBit
– DirtyBit, Commented Mar 22, 2019 at 7:23
@DirtyBit haha... no - I meant which version of BeautifulSoup4 is in use... — Jon Clements
– Jon Clements, Commented Mar 22, 2019 at 7:26

QHarr · Accepted Answer · 2019-03-22 07:35:06Z

1

I had to use selenium with a wait condition

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get('https://www.boletinoficial.gob.ar/')
links =  [item.get_attribute('href') for item in WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".itemsection [href]")))]
print(links)

Text and link as tuples

data =  [(item.get_attribute('href'), item.text) for item in WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".itemsection [href]")))]
print(data)

answered Mar 22, 2019 at 7:35

QHarr

84.5k14 gold badges58 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

DirtyBit Over a year ago

I knew we would have to end up doing something like this! +1

QHarr Over a year ago

@DirtyBit Yeah... I tried with requests but computer said no

Collectives™ on Stack Overflow

Python Extract href Problems

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related