0

I'm trying to scrape recipe titles from a website, Link using Selenium, but I’m encountering an issue where I can only extract some of the titles, while others return empty strings.

I’m using the following code snippet to retrieve the titles:

page_url = f'https://www.allrecipes.com/search?{keyword}={keyword}&offset={nb}&q={keyword}'.format(keyword=keyword, nb=nb)

service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service)

driver.get(page_url)

titles =  [element.get_attribute('data-tag') for element in driver.find_elements(By.CLASS_NAME, "card__content ")]
recipe_links = [element.get_attribute('href') for element in driver.find_elements(By.CSS_SELECTOR, 'a.comp.mntl-card-list-items.mntl-document-card.mntl-card.card.card--no-image')]

print(titles,recipe_links)
driver.quit()

While this successfully extracts all recipe links and 2 first titles, some titles are returning empty strings.

when I tried this code:

titles = driver.find_elements(By.XPATH, "//span[@class='card__title']")
for title in titles:
    print(title.get_attribute('outerHTML'))

This displayed the elements of the page, including the titles correctly:

<span class="card__title">
    <span class="card__title-text ">Chicken Makhani (Indian Butter Chicken)</span>
</span>
title:  
<span class="card__title-text ">Chicken Makhani (Indian Butter Chicken)</span>
...
  1. Why am I getting empty strings for certain titles?
  2. How can I ensure that I can retrieve, from the first page, all titles correctly?
1
  • Did you tried title.text ?? Commented Oct 18, 2024 at 9:36

1 Answer 1

0

Main issue in my opinion is the onetrust popup, that blocks the rest of the content and should be closed before.

WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'button[id="onetrust-reject-all-handler"]'))).click()

Also try to change your strategy to select elements and collecting information to avoid several lists/iterations and get all information in one go. Check the selection of the card and extraction of child elements.

Example:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()

url = f'https://www.allrecipes.com/search?chicken=chicken&offset=0&q=chicken'
driver.get(url)

WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'button[id="onetrust-reject-all-handler"]'))).click()

data = []

for e in driver.find_elements(By.CSS_SELECTOR,'a[id^="mntl-card-list-items"]'):
    data.append(
        {
            'title' : e.find_element(By.CSS_SELECTOR,'.card__title-text').text,
            'url' : e.get_attribute('href')
        }
    )

print(data)

Output of our generated dict:

[{'title': 'Chicken Makhani (Indian Butter Chicken)', 'url': 'https://www.allrecipes.com/recipe/45957/chicken-makhani-indian-butter-chicken/'}, {'title': 'Chicken Arroz Caldo (Chicken Rice Porridge)', 'url': 'https://www.allrecipes.com/recipe/212940/chicken-arroz-caldo-chicken-rice-porridge/'}, {'title': 'Garlic Chicken Fried Chicken', 'url': 'https://www.allrecipes.com/recipe/86047/garlic-chicken-fried-chicken/'}, {'title': 'Chicken Fried Chicken', 'url': 'https://www.allrecipes.com/recipe/16573/chicken-fried-chicken/'}, {'title': 'Chicken Enchiladas with Cream of Chicken Soup', 'url': 'https://www.allrecipes.com/recipe/22737/chicken-enchiladas-v/'}, {'title': 'Makhani Chicken (Indian Butter Chicken)', 'url': 'https://www.allrecipes.com/recipe/24782/makhani-chicken-indian-butter-chicken/'}, {'title': 'Simple Baked Chicken Breasts', 'url': 'https://www.allrecipes.com/recipe/240208/simple-baked-chicken-breasts/'}, {'title': 'Best Chicken Salad', 'url': 'https://www.allrecipes.com/recipe/8499/basic-chicken-salad/'}, {'title': 'Crispy Fried Chicken', 'url': 'https://www.allrecipes.com/recipe/8805/crispy-fried-chicken/'}, {'title': "Chef John's Nashville Hot Chicken", 'url': 'https://www.allrecipes.com/recipe/254804/chef-johns-nashville-hot-chicken/'}, {'title': 'Chicken Parmesan', 'url': 'https://www.allrecipes.com/recipe/223042/chicken-parmesan/'}, {'title': 'Juicy Roasted Chicken', 'url': 'https://www.allrecipes.com/recipe/83557/juicy-roasted-chicken/'}, {'title': 'Baked Chicken Schnitzel', 'url': 'https://www.allrecipes.com/recipe/244950/baked-chicken-schnitzel/'}, {'title': 'Rotisserie Chicken', 'url': 'https://www.allrecipes.com/recipe/93168/rotisserie-chicken/'}, {'title': 'Quick and Easy Chicken Noodle Soup', 'url': 'https://www.allrecipes.com/recipe/26460/quick-and-easy-chicken-noodle-soup/'}, {'title': "General Tso's Chicken", 'url': 'https://www.allrecipes.com/recipe/91499/general-tsaos-chicken-ii/'}, {'title': 'Baked Teriyaki Chicken', 'url': 'https://www.allrecipes.com/recipe/9023/baked-teriyaki-chicken/'}, {'title': 'Buffalo Chicken Dip', 'url': 'https://www.allrecipes.com/recipe/68461/buffalo-chicken-dip/'}, {'title': 'Chicken Cordon Bleu', 'url': 'https://www.allrecipes.com/recipe/8495/chicken-cordon-bleu-i/'}, {'title': 'Southern Fried Chicken', 'url': 'https://www.allrecipes.com/recipe/8635/southern-fried-chicken/'}, {'title': "Chef John's Buttermilk Fried Chicken", 'url': 'https://www.allrecipes.com/recipe/220128/chef-johns-buttermilk-fried-chicken/'}, {'title': 'Yummy Honey Chicken Kabobs', 'url': 'https://www.allrecipes.com/recipe/8626/yummy-honey-chicken-kabobs/'}, {'title': 'Broccoli Chicken Casserole', 'url': 'https://www.allrecipes.com/recipe/8965/broccoli-chicken-casserole-i/'}, {'title': 'Best Chicken Marinade', 'url': 'https://www.allrecipes.com/recipe/83793/best-chicken-marinade/'}]
Sign up to request clarification or add additional context in comments.

1 Comment

I had some doubts about that, but I didn’t fully investigate the OneTrust popup issue. With your solution, I can now successfully access the rest of my data! Thank you for your response and your help

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.