0

I'm learning webscraping and working on Eat24 (Yelp's website). I'm able to scrape basic data from Yelp, but unable to do something pretty simple: append that data to a dataframe. Here is my code, I've notated it so it should be simple to follow along.

from selenium import webdriver
import time
import pandas as pd
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome()

#go to eat24, type in zip code 10007, choose pickup and click search

driver.get("https://new-york.eat24hours.com/restaurants/index.php")
search_area = driver.find_element_by_name("address_auto_complete")
search_area.send_keys("10007")
pickup_element = driver.find_element_by_xpath("//[@id='search_form']/div/table/tbody/tr/td[2]")
pickup_element.click()
search_button = driver.find_element_by_xpath("//*[@id='search_form']/div/table/tbody/tr/td[3]/button")
search_button.click()


#scroll up and down on page to load more of 'infinity' list

for i in range(0,3):
    driver.execute_script("window.scrollTo(0, 
document.body.scrollHeight);")
    driver.execute_script("window.scrollTo(0,0);")
    time.sleep(1)

#find menu urls

menu_urls = [page.get_attribute('href') for page in 
driver.find_elements_by_xpath('//*[@title="View Menu"]')]

df = pd.DataFrame(columns=['name', 'menuitems'])

#collect menu items/prices/name from each URL
for url in menu_urls:
    driver.get(url)
    menu_items = driver.find_elements_by_class_name("cpa")
    menu_items = [x.text for x in menu_items]
    menu_prices = driver.find_elements_by_class_name('item_price')
    menu_prices = [x.text for x in menu_prices]
    name = driver.find_element_by_id('restaurant_name')
    menuitems = dict(zip(menu_items, menu_prices))
    df['name'] = name
    df['menuitems'] = menuitems

df.to_csv('test.csv', index=False)

The problem is at the end. It isn't adding menuitems + name into successive rows in the dataframe. I have tried using .loc and other functions but it got messy so I removed my attempts. Any help would be appreciated!!

Edit: The error I get is "ValueError: Length of values does not match length of index" when the for loop attempts to add a second set of menuitems/restaurant name to the dataframe

4
  • what is pd in pd.DataFrame? it's not defined in fragment you posted Commented Aug 14, 2017 at 21:52
  • @KirilS. import pandas as pd Commented Aug 14, 2017 at 21:54
  • Sorry yes I added that to the beginning, edited. Thanks for catching that Commented Aug 14, 2017 at 21:59
  • Shouldn't you have a third column for menu_prices, or maybe join the items in menuitems? Commented Aug 15, 2017 at 8:55

1 Answer 1

1

I figured out a simple solution, not sure why I didn't think of it before. I added a "row" count that goes up by 1 on each iteration, and used .loc to place data in the "row"th row

row = 0
for url in menu_urls:
    row +=1
    driver.get(url)
    menu_items = driver.find_elements_by_class_name("cpa")
    menu_items = [x.text for x in menu_items]
    menu_prices = driver.find_elements_by_class_name('item_price')
    menu_prices = [x.text for x in menu_prices]
    name = driver.find_element_by_id('restaurant_name').text
    menuitems = [dict(zip(menu_items, menu_prices))]
    df.loc[row, 'name'] = name
    df.loc[row, 'menuitems'] = menuitems
    print df
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.