1

I m using this code for scrapping some data from the link https://website.grader.com/results/www.dubizzle.com. Because the actual script with the tags i want to extract loads after a 15 seconds of load, someone recommended me selemuim to introduce a delay in the code. Hence I use this code

The code is as below

#!/usr/bin/python
import urllib
import time
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup
from dateutil.parser import parse
from datetime import timedelta
import MySQLdb
import re
import pdb
import sys
import string



driver = webdriver.Firefox()
driver.get('https://website.grader.com/results/dubizzle.com')
time.sleep(25)
html = driver.page_source
soup  = BeautifulSoup(html)


# print soup

Sizeofweb=""
try:

    Sizeofweb= soup.find('span', {'data-reactid': ".0.0.3.0.0.3.$0.1.1.0"}).text
    print Sizeofweb.get_text().encode("utf-8")

except StandardError as e:
    converted_date="Error was {0}".format(e)
    print converted_date

The part of the html which i am extracting is as below

Snap: https://www.dropbox.com/s/7dwbaiyizwa36m6/5.PNG?dl=0

<div class="result-value" data-reactid=".0.0.3.0.0.3.$0.1.1">
<span data-reactid=".0.0.3.0.0.3.$0.1.1.0">1.1</span>
<span class="result-value-unit" data-reactid=".0.0.3.0.0.3.$0.1.1.1">MB</span>
</div>

I installed the geckodriver by downloading it from here and extracting it to /home directory and then giving it a path export PATH=$PATH:/home/geckodriver as recommended by someone named @Ahn Smith here

Now when i run the program, it gives this error

Traceback (most recent call last):
  File "ahmed.py", line 17, in <module>
    driver = webdriver.Firefox()
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/webdriver.py", line 140, in __init__
    self.service.start()
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/common/service.py", line 74, in start
    stdout=self.log_file, stderr=self.log_file)
  File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
    raise child_exception
OSError: [Errno 20] Not a directory

1 Answer 1

1

There are two ways to point Selenium to the appropriate webdriver. You can pass it as a parameter:

driver = webdriver.Firefox(executable_path='/path/to/geckodriver')

Or you can create a local shell variable containing the PATH:

$ export PATH=$PATH:/path/to/

I think your problem is that you're exporting a PATH variable to the geckodriver and not to the folder containing it.

Sign up to request clarification or add additional context in comments.

4 Comments

is the path/to/ should be as it is? the geckodriver is in home directory so should i put driver = webdriver.Firefox(executable_path='/path/to/geckodriver')
If passing as an executable_path parameter you should include the full path including the geckodriver. If adding a PATH variable to your shell you should include only the path to the directory containing the geckodriver.
Please check my answer for the code and error. i have put the geckodriver in the home directory
Try a reboot. If that fails, try checking your permissions to make sure the geckodriver is executable. (Or try copying it to /usr/local/sbin/)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.