0

I've been trying to filter the labels of a scraper that I'm doing but I've not been able to find the way in which I can filter the data that I need. The code is the following:

from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

url = input("Url a scrapear: ")

pagina = requests.get(url)

elementos = BeautifulSoup(pagina.content, 'html.parser')

productos = elementos.find_all('div', class_='picture')

enlaces = pd.DataFrame(productos)

print(productos)

I need to extract the href data from all the tags that are inside the specified . Any idea how to do it? Because I've tried everything I could think of and I can't find a solution. This is the last code I've try, but it didn't work with any parameter I use to filter href data

1 Answer 1

1
url = "https://www.plasticosur.com/almacenaje-y-conservacion-#/isFilters=1&pageSize=36&viewMode=grid&orderBy=15&pageNumber=1"

pagina = requests.get(url)

elementos = BeautifulSoup(pagina.content, 'html.parser')

productos = elementos.find_all('div', class_='picture')
out_dict = {}
for div in elementos.findAll('div', attrs={'class':'picture'}):
    out_dict[div.find('a')['href']] = div.find('img')['src']


print(out_dict)
Sign up to request clarification or add additional context in comments.

5 Comments

I already try it, but itn't work cuz it can't detec the attribute find_all exist. This is the error I see: AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
Give me an example URL
I am using beautifulsoup4==4.10.0 FYI, and have edited answer
That really help a lot dude. Thank you! With that I can do what I want :D

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.