1

I'm trying to get all a href from a url. The problem is I can't extract the write a href:

<a href="#!DetalleNorma/203906/20190322" title="" data-bind="html: organismo, attr: {href: $root.crearHrefDetalleNorma(idTamite,fechaPublicacion)} ">SECRETARÍA GENERAL</a>

All I can extract is: #!

from bs4 import BeautifulSoup
import urllib.request as urllib2
import re

html_page = urllib2.urlopen('https://www.boletinoficial.gob.ar/')
soup = BeautifulSoup(html_page)
for link in soup.findAll('a'):
    print link.get('href')

Here is with the parse. It is not working too:

import requests
from lxml import html
from bs4 import BeautifulSoup

r = requests.get('https://www.boletinoficial.gob.ar/')
soup = BeautifulSoup(r.content, "html.parser")

for td in soup.findAll("div", class_="itemsection"):
    for a in td.findAll("a", href=True):
        print(a.text)
10
  • 1. Use a html parser Commented Mar 22, 2019 at 7:22
  • 1
    @DirtyBit they are... Commented Mar 22, 2019 at 7:22
  • @JonClements Would throw UserWarning: No parser was explicitly specified, in Python 3.x Commented Mar 22, 2019 at 7:23
  • 1
    @DirtyBit depends on the version... Commented Mar 22, 2019 at 7:25
  • 1
    @DirtyBit haha... no - I meant which version of BeautifulSoup4 is in use... Commented Mar 22, 2019 at 7:26

1 Answer 1

1

I had to use selenium with a wait condition

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get('https://www.boletinoficial.gob.ar/')
links =  [item.get_attribute('href') for item in WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".itemsection [href]")))]
print(links)

Text and link as tuples

data =  [(item.get_attribute('href'), item.text) for item in WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".itemsection [href]")))]
print(data)
Sign up to request clarification or add additional context in comments.

2 Comments

I knew we would have to end up doing something like this! +1
@DirtyBit Yeah... I tried with requests but computer said no

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.