0

The purpose of this code is to scrape a data table form a some links then turn it into a pandas data frame.

The problem is that this code only scrapes the first 7 rows only which are in the first page of the table and I want to capture the whole table. so when i tried to loop over table pages, i got an error.

Here is the code:

from selenium import webdriver

urls = open(r"C:\Users\Sayed\Desktop\script\sample.txt").readlines()
for url in urls:
    driver = webdriver.Chrome(r"D:\Projects\Tutorial\Driver\chromedriver.exe")
    driver.get(url)
    for item in driver.find_element_by_xpath('//*[contains(@id,"showMoreHistory")]/a'):
        driver.execute_script("arguments[0].click();", item)

    for table in driver.find_elements_by_xpath('//*[contains(@id,"eventHistoryTable")]//tr'):
        data = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")]
        print(data)

here is the error:

Traceback (most recent call last):

File "D:/Projects/Tutorial/ff.py", line 8, in for item in driver.find_element_by_xpath('//*[contains(@id,"showMoreHistory")]/a'):

TypeError: 'WebElement' object is not iterable

2
  • first 7 rows are visible in UI, in order to scrape more you will have to click on show more link first. Commented Sep 21, 2018 at 16:59
  • @cruisepandey so how to make it visible/ click to show more Commented Sep 21, 2018 at 17:11

3 Answers 3

3

Check out the below script to get the whole table from that webpage. I've used harcoded delay within my script which is not a good practice. However, you can always define Explicit Wait to make the code more robust:

import time
from selenium import webdriver

url = 'https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155'

driver = webdriver.Chrome()
driver.get(url)
item = driver.find_element_by_xpath('//*[contains(@id,"showMoreHistory")]/a')
driver.execute_script("arguments[0].click();", item)
time.sleep(2)
for table in driver.find_elements_by_xpath('//*[contains(@id,"eventHistoryTable")]//tr'):
    data = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")]
    print(data)

driver.quit()

To get all the data exhausting the show more button along with defining Explicit Wait you can try the below script:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url = 'https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155'

driver = webdriver.Chrome()
driver.get(url)
wait = WebDriverWait(driver,10)

while True:
    try:
        item = wait.until(EC.visibility_of_element_located((By.XPATH,'//*[contains(@id,"showMoreHistory")]/a')))
        driver.execute_script("arguments[0].click();", item)
    except Exception:break

for table in wait.until(EC.visibility_of_all_elements_located((By.XPATH,'//*[contains(@id,"eventHistoryTable")]//tr'))):
    data = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")]
    print(data)

driver.quit()
Sign up to request clarification or add additional context in comments.

4 Comments

thanks for help but it does NOT scrape the complete table but it gives me more rows
Sorry, I didn't notice that the script needs to click there multiple times.
As the script will click on the show more button silently, just wait until the browser quits itself.
@SIM how to print the result as table not text
1

As per your question and the url https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155 to scrape the whole table you can use the following solution:

  • Code Block:

    # -*- coding: UTF-8 -*-
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.common.exceptions import TimeoutException
    
    table_rows = []
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_argument('disable-infobars')
    driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get("https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155")
    show_more_button = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.genTbl.openTbl.ecHistoryTbl#eventHistoryTable1155 tr>th.left.symbol")))
    driver.execute_script("arguments[0].scrollIntoView(true);",show_more_button);
    myLength = len(WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table.genTbl.openTbl.ecHistoryTbl#eventHistoryTable1155 tr[event_attr_id='1155']"))))
    while True:
        try:
            WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div#showMoreHistory1155>a"))).click()
            WebDriverWait(driver, 20).until(lambda driver: len(driver.find_elements_by_css_selector("table.genTbl.openTbl.ecHistoryTbl#eventHistoryTable1155 tr[event_attr_id='1155']")) > myLength)
            table_rows = driver.find_elements_by_css_selector("table.genTbl.openTbl.ecHistoryTbl#eventHistoryTable1155 tr[event_attr_id='1155']")
            myLength = len(table_rows)
        except TimeoutException:
            break
    for row in table_rows:
        print(row.text)
    driver.quit()
    
  • Console Output:

    Sep 24, 2018 01:30
    Sep 17, 2018 01:30 53.1%   55.3%
    Sep 10, 2018 01:30 55.3%   49.0%
    Sep 03, 2018 01:30 49.0%   43.3%
    Aug 27, 2018 01:30 43.3%   49.7%
    Aug 20, 2018 01:30 49.7%   52.5%
    Aug 13, 2018 01:30 52.5%   59.9%
    Aug 06, 2018 01:30 59.9%   62.6%
    Jul 30, 2018 01:30 62.6%   52.8%
    Jul 23, 2018 01:30 52.8%   52.7%
    Jul 16, 2018 01:30 52.7%   46.2%
    Jul 10, 2018 01:30 46.2%   55.3%
    Jul 02, 2018 01:30 55.3%   53.1%
    Jun 25, 2018 01:30 53.1%   66.2%
    Jun 18, 2018 01:30 66.2%   65.2%
    Jun 11, 2018 01:30 65.2%   61.2%
    Jun 04, 2018 01:30 61.2%   63.9%
    May 28, 2018 01:30 63.9%   67.0%
    May 21, 2018 01:30 67.0%   63.2%
    May 14, 2018 01:30 63.2%   61.3%
    May 07, 2018 01:30 61.3%   57.6%
    Apr 30, 2018 01:30 57.6%   64.8%
    Apr 23, 2018 01:30 64.8%   65.2%
    Apr 16, 2018 01:30 65.2%   60.4%
    Apr 09, 2018 01:30 60.4%   63.3%
    Apr 02, 2018 01:30 63.3%   62.1%
    Mar 26, 2018 01:30 62.1%   65.7%
    Mar 19, 2018 02:30 65.7%   56.0%
    Mar 12, 2018 02:30 56.0%   62.3%
    Mar 05, 2018 02:30 62.3%   59.1%
    Feb 26, 2018 02:30 59.1%   52.8%
    Feb 19, 2018 02:30 52.8%   55.8%
    Feb 12, 2018 02:30 55.8%   51.7%
    Feb 05, 2018 02:30 51.7%   56.8%
    Jan 29, 2018 02:30 56.8%   52.2%
    Jan 22, 2018 02:30 52.2%   56.1%
    Jan 15, 2018 02:30 56.1%   60.2%
    Jan 08, 2018 02:30 60.2%   54.6%
    Jan 01, 2018 02:30 54.6%   48.4%
    Dec 25, 2017 02:30 48.4%   66.4%
    Dec 18, 2017 02:30 66.4%   58.9%
    Dec 11, 2017 02:30 58.9%   53.8%
    Dec 04, 2017 02:30 53.8%   55.9%
    Nov 28, 2017 02:30 55.9%   53.7%
    Nov 20, 2017 02:30 53.7%   58.6%
    Nov 14, 2017 02:30 58.6%   52.8%
    Nov 06, 2017 02:30 52.8%   57.6%
    Oct 30, 2017 01:30 57.6%   54.7%
    Oct 23, 2017 01:30 54.7%   58.9%
    Oct 16, 2017 01:30 58.9%   57.3%
    Oct 09, 2017 01:30 57.3%   64.0%
    Oct 02, 2017 01:30 64.0%   47.5%
    Sep 25, 2017 01:30 47.5%   52.2%
    Sep 18, 2017 01:30 52.2%   55.5%
    Sep 11, 2017 01:30 55.5%   54.3%
    Sep 04, 2017 01:30 54.3%   54.2%
    Aug 28, 2017 01:30 54.2%   51.4%
    Aug 21, 2017 01:30 51.4%   57.4%
    Aug 14, 2017 01:30 57.4%   51.2%
    Aug 07, 2017 01:30 51.2%   51.3%
    Jul 31, 2017 01:30 51.3%   52.8%
    Jul 24, 2017 01:30 52.8%   53.3%
    Jul 17, 2017 01:30 53.3%   54.1%
    Jul 10, 2017 01:30 54.1%   51.9%
    Jul 03, 2017 01:30 51.9%   40.6%
    Jun 26, 2017 01:30 40.6%   52.6%
    Jun 19, 2017 01:30 52.6%   51.0%
    Jun 12, 2017 01:30 51.0%   52.1%
    Jun 05, 2017 01:30 52.1%   59.1%
    May 29, 2017 01:30 59.1%   46.9%
    May 22, 2017 01:30 46.9%   53.0%
    May 15, 2017 01:30 53.0%   44.9%
    May 08, 2017 01:30 44.9%   37.0%
    May 01, 2017 01:30 37.0%   43.0%
    Apr 24, 2017 01:30 43.0%   52.4%
    Apr 10, 2017 01:30 52.4%   55.1%
    Apr 03, 2017 01:30 55.1%   43.5%
    Mar 27, 2017 02:30 43.5%   36.0%
    Mar 20, 2017 02:30 36.0%   32.3%
    Mar 13, 2017 02:30 32.3%   42.8%
    Mar 06, 2017 02:30 42.8%   39.1%
    Feb 27, 2017 02:30 39.1%   41.7%
    Feb 20, 2017 02:30 41.7%   43.2%
    Feb 13, 2017 02:30 43.2%   36.6%
    Feb 06, 2017 02:30 36.6%   39.7%
    Jan 30, 2017 02:30 39.7%   33.5%
    Jan 23, 2017 02:30 33.5%   36.8%
    Jan 16, 2017 03:30 36.8%   37.0%
    Jan 09, 2017 02:30 37.0%   41.6%
    Jan 02, 2017 02:30 41.6%   35.8%
    Dec 26, 2016 02:30 35.8%   42.3%
    Dec 19, 2016 02:30 42.3%   39.7%
    Dec 12, 2016 04:15 39.7%   33.8%
    Dec 05, 2016 02:30 33.8%   37.1%
    Nov 29, 2016 02:30 37.1%   41.9%
    Nov 21, 2016 02:30 41.9%   39.1%
    Nov 15, 2016 02:00 39.1%   20.5%
    Nov 07, 2016 02:30 20.5%   27.4%
    Oct 31, 2016 02:30 27.4%   33.4%
    Oct 25, 2016 02:30 33.4%   30.8%
    Oct 18, 2016 02:30 30.8%   26.6%
    Oct 10, 2016 02:30 26.6%   28.6%
    Oct 05, 2016 02:00 28.6%   26.2%
    Sep 26, 2016 02:30 26.2%   34.8%
    Sep 19, 2016 02:30 34.8%   21.2%
    Sep 13, 2016 02:30 21.2%   27.0%
    Sep 05, 2016 02:30 27.0%   32.7%
    Aug 29, 2016 02:30 32.7%   23.9%
    Aug 22, 2016 02:30 23.9%   28.8%
    Aug 15, 2016 02:30 28.8%   30.8%
    Aug 08, 2016 02:30 30.8%   20.3%
    Aug 01, 2016 02:30 20.3%   30.2%
    Jul 25, 2016 02:30 30.2%   29.5%
    Jul 18, 2016 02:30 29.5%   26.2%
    Jul 11, 2016 02:30 26.2%   27.5%
    Jul 04, 2016 02:30 27.5%   26.8%
    Jun 27, 2016 02:30 26.8%   35.1%
    Jun 20, 2016 02:30 35.1%   22.8%
    Jun 13, 2016 02:30 22.8%   32.5%
    Jun 06, 2016 02:30 32.5%   35.6%
    May 30, 2016 02:30 35.6%   39.5%
    May 23, 2016 02:30 39.5%   37.8%
    May 16, 2016 03:30 37.8%   39.5%
    May 09, 2016 02:30 39.5%   30.3%
    May 02, 2016 02:30 30.3%   32.9%
    Apr 25, 2016 02:30 32.9%   29.6%
    Apr 18, 2016 06:00 29.6%   30.5%
    Apr 11, 2016 02:30 30.5%   22.7%
    Apr 04, 2016 03:30 22.7%   32.1%
    Mar 28, 2016 03:30 32.1%   23.2%
    Mar 21, 2016 03:30 23.2%   26.7%
    Mar 14, 2016 03:30 26.7%   22.6%
    Mar 07, 2016 03:30 22.6%   33.7%
    Feb 29, 2016 03:30 33.7%   34.8%
    Feb 22, 2016 03:30 34.8%   33.3%
    Feb 15, 2016 03:30 33.3%   33.3%
    Feb 08, 2016 03:30 33.3%   34.3%
    Feb 01, 2016 03:30 34.3%   33.2%
    Jan 25, 2016 03:30 33.2%   27.0%
    Jan 18, 2016 03:30 27.0%   27.2%
    Jan 11, 2016 03:30 27.2%   30.0%
    Jan 05, 2016 03:30 30.0%   24.0%
    Dec 29, 2015 03:30 24.0%   33.3%
    Dec 21, 2015 03:30 33.3%   31.2%
    Dec 14, 2015 04:30 31.2%   27.1%
    Dec 07, 2015 03:00 27.1%   29.8%
    Dec 01, 2015 03:00 29.8%   27.5%
    Nov 23, 2015 03:00 27.5%   33.1%
    Nov 17, 2015 04:00 33.1%   26.8%
    Nov 09, 2015 02:30 26.8%   24.3%
    Nov 02, 2015 01:30 24.3%   36.4%
    Oct 26, 2015 01:30 36.4%   28.6%
    Oct 19, 2015 01:30 28.6%   25.5%
    Oct 11, 2015 04:30 25.5%   29.6%
    Oct 06, 2015 01:00 29.6%   28.5%
    Sep 28, 2015 01:30 28.5%   29.1%
    Sep 21, 2015 01:30 29.1%   21.2%
    Sep 14, 2015 01:30 21.2%   29.8%
    Sep 07, 2015 01:30 29.8%   36.3%
    Aug 31, 2015 01:30 36.3%   35.6%
    Aug 24, 2015 01:30 35.6%   26.4%
    Aug 17, 2015 01:30 26.4%   24.8%
    Aug 10, 2015 01:30 24.8%   29.7%
    Aug 03, 2015 01:30 29.7%   24.8%
    Jul 27, 2015 01:30 24.8%   30.7%
    Jul 20, 2015 01:30 30.7%   27.9%
    Jul 13, 2015 01:30 27.9%   27.4%
    Jul 07, 2015 01:30 27.4%   26.8%
    Jun 29, 2015 01:30 26.8%   33.1%
    Jun 22, 2015 01:30 33.1%   33.6%
    Jun 15, 2015 03:30 33.6%   28.9%
    Jun 08, 2015 01:30 28.9%   23.0%
    Jun 01, 2015 01:30 23.0%   34.0%
    May 25, 2015 04:00 34.0%   28.9%
    May 18, 2015 01:30 28.9%   28.8%
    May 11, 2015 01:30 28.8%   28.3%
    May 04, 2015 02:00 28.3%   23.7%
    Apr 27, 2015 01:30 23.7%   27.2%
    Apr 20, 2015 01:30 27.2%   33.7%
    Apr 13, 2015 02:00 33.7%   23.2%
    Apr 06, 2015 02:00 23.2%   19.8%
    Mar 30, 2015 02:30 19.8%   24.1%
    Mar 23, 2015 02:30 24.1%   27.2%
    Mar 16, 2015 03:00 27.2%   35.6%
    Mar 09, 2015 02:30 35.6%   34.4%
    Mar 02, 2015 02:30 34.4%   30.2%
    Feb 23, 2015 02:30 30.2%   26.6%
    Feb 16, 2015 03:30 26.6%   23.8%
    Feb 09, 2015 02:30 23.8%   26.4%
    Feb 02, 2015 02:30 26.4%   23.9%
    Jan 26, 2015 02:30 23.9%   28.9%
    Jan 19, 2015 02:30 28.9%   35.5%
    Jan 12, 2015 02:30 35.5%   38.1%
    Jan 06, 2015 03:30 38.1%   40.6%
    Jan 01, 2015 02:30 40.6%   45.2%
    Dec 22, 2014 02:00 45.2%   39.8%
    Dec 15, 2014 02:00 39.8%   41.7%
    Dec 07, 2014 21:00 41.7%   33.8%
    Dec 02, 2014 03:00 33.8%   38.6%
    Nov 24, 2014 01:30 38.6%   39.2%
    Nov 17, 2014 01:00 39.2%   33.1%
    Nov 10, 2014 01:00 33.1%   35.4%
    Nov 04, 2014 03:00 35.4%   37.3%
    Oct 27, 2014 02:00 37.3%   33.7%
    Oct 19, 2014 22:00 33.7%   36.2%
    Oct 13, 2014 01:00 36.2%   44.5%
    Oct 06, 2014 01:00 44.5%   41.3%
    Sep 29, 2014 01:00 41.3%   50.3%
    Sep 21, 2014 22:35 50.3%   39.5%
    Sep 15, 2014 00:45 39.5%   39.9%
    Sep 08, 2014 01:00 39.9%   42.8%
    Sep 01, 2014 02:35 42.8%   41.9%
    Aug 25, 2014 01:00 41.9%   38.9%
    Aug 18, 2014 01:00 38.9%   34.0%
    Aug 11, 2014 01:00 34.0%   38.2%
    Aug 04, 2014 01:00 38.2%   38.4%
    Jul 28, 2014 01:00 38.4%   42.3%
    Jul 21, 2014 01:00 42.3%   37.2%
    Jul 14, 2014 01:00 37.2%   39.6%
    Jul 07, 2014 01:00 39.6%   39.8%
    Jun 30, 2014 01:00 39.8%   36.1%
    Jun 23, 2014 00:30 36.1%   37.6%
    Jun 16, 2014 00:30 37.6%   36.5%
    Jun 09, 2014 00:30 36.5%   44.1%
    Jun 01, 2014 22:00 44.1%   49.4%
    May 26, 2014 00:30 49.4%   41.0%
    May 19, 2014 00:00 41.0%   55.0%
    May 12, 2014 00:00 55.0%   41.1%
    May 04, 2014 06:00 41.1%   43.5%
    Apr 27, 2014 06:00 43.5%   40.3%
    Apr 06, 2014 06:00 40.3%
    

7 Comments

thanks for help, but the code is not working probably (3 tries for 1 success) and in case of another link/s it falls apart.
@SayedGouda Can you update the question with your current code attempt, the error are you facing and at which line? Are you using compatible binary versions?
Thank you for your sincere attention and the code is updated
@SayedGouda Where did you find find_elements_by_xpath('//*[contains(@id,"eventHistoryTable")]//tr') within my solution? Did you try to execute the code in my answer for once?
What exactly do you mean by any other link? The code should work only for the given link only which is https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155. You are yet to share the error if you are facing one from my answer.
|
0

SIM's answer is awsome, but "item.text" is extremely SLOW because .text would do some communication with chrome instead of parsing the html directly.

Instead, I would recommend using

item.get_attribute('innerHTML')

In my test, .text is 100-150ms and .get_attribute('innerHTML') is 40ms per invoke. So if you have 10 columns per row and 10 rows per table, that would be a difference between 10-15s and 0.4s, which is quite noticable

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.