0

I am trying to extract data from https://ash.confex.com/ash/2019/webprogram/start.htm and getting an error with find_all of beautifulsoup

import webbrowser
import os
import requests
from bs4 import BeautifulSoup
import sys
import wget
import pandas as pd
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome('D:\\crome drive\\chromedriver.exe')
driver.get('https://ash.confex.com/ash/2019/webprogram/start.html')
searchterm = driver.find_element_by_id("words").send_keys("CAR-T")
driver.find_element_by_name("submit").click()
#driver.find_element_by_tag_name("resulttitle")
#driver.find_element_by_class_name("a")

soup_level1=BeautifulSoup(driver.page_source, 'lxml')
#fl=soup_level1.find_all(class_='soup_level1')
results = soup_level1.find_all('div', attrs={'class':'resulttitle'})
tag = results.findall('a', attrs='href')

I am getting error

AttributeError: ResultSet object has no attribute 'findall'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

3
  • 1
    The error description cannot get any simpler :). You need to iterate over the results and then call findall. Commented Nov 7, 2019 at 13:58
  • Once you are using find_all, then findall. Have you tried with findAll? Commented Nov 7, 2019 at 14:02
  • Does this answer your question? Beautiful Soup: 'ResultSet' object has no attribute 'find_all'? Commented Mar 22, 2020 at 22:33

1 Answer 1

1

Yeah it's exactly as the error says - the find_all method is supposed to be used on an html tree but in your code the variable results is a ResultSet object. In bs4 this is a list where each item is an HTML tree.

results = soup_level1.find_all('div', attrs={'class':'resulttitle'})
print(type(results))   # <class 'bs4.element.ResultSet'>
print(results)   # []

This also shows that your results is empty. I searched through the HTML of and didn't see any div with class = "resulttitle" so you may want to double check what you're looking for.

In theory, if your results variable weren't empty, you could loop through each item in results and then find all of the links you're looking for:

results = soup_level1.find_all('div', attrs={'class':'resulttitle'})
for result in results:
    tag_list = result.find_all('a', attrs='href)     
    # this will yield another list where each item is an HTML tree
Sign up to request clarification or add additional context in comments.

1 Comment

Hi Joseph, In results below information are there and I want to extract <a href> from that eg. a href="Paper124312.html"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.