Extract data in Python using beautifulsoup

Question

I am trying to extract data from https://ash.confex.com/ash/2019/webprogram/start.htm and getting an error with find_all of beautifulsoup

import webbrowser
import os
import requests
from bs4 import BeautifulSoup
import sys
import wget
import pandas as pd
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome('D:\\crome drive\\chromedriver.exe')
driver.get('https://ash.confex.com/ash/2019/webprogram/start.html')
searchterm = driver.find_element_by_id("words").send_keys("CAR-T")
driver.find_element_by_name("submit").click()
#driver.find_element_by_tag_name("resulttitle")
#driver.find_element_by_class_name("a")

soup_level1=BeautifulSoup(driver.page_source, 'lxml')
#fl=soup_level1.find_all(class_='soup_level1')
results = soup_level1.find_all('div', attrs={'class':'resulttitle'})
tag = results.findall('a', attrs='href')

I am getting error

AttributeError: ResultSet object has no attribute 'findall'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

The error description cannot get any simpler :). You need to iterate over the results and then call findall. — abhilb
– abhilb, Commented Nov 7, 2019 at 13:58
Once you are using find_all, then findall. Have you tried with findAll? — josifoski
– josifoski, Commented Nov 7, 2019 at 14:02
Does this answer your question? Beautiful Soup: 'ResultSet' object has no attribute 'find_all'? — AMC
– AMC, Commented Mar 22, 2020 at 22:33

Joseph Rajchwald · Accepted Answer · 2019-11-07 14:07:41Z

1

Yeah it's exactly as the error says - the find_all method is supposed to be used on an html tree but in your code the variable results is a ResultSet object. In bs4 this is a list where each item is an HTML tree.

results = soup_level1.find_all('div', attrs={'class':'resulttitle'})
print(type(results))   # <class 'bs4.element.ResultSet'>
print(results)   # []

This also shows that your results is empty. I searched through the HTML of and didn't see any div with class = "resulttitle" so you may want to double check what you're looking for.

In theory, if your results variable weren't empty, you could loop through each item in results and then find all of the links you're looking for:

results = soup_level1.find_all('div', attrs={'class':'resulttitle'})
for result in results:
    tag_list = result.find_all('a', attrs='href)     
    # this will yield another list where each item is an HTML tree

answered Nov 7, 2019 at 14:07

Joseph Rajchwald

4875 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Big_Data_engineer Over a year ago

Hi Joseph, In results below information are there and I want to extract <a href> from that eg. a href="Paper124312.html"

Collectives™ on Stack Overflow

Extract data in Python using beautifulsoup

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related