0

Following is a web scraping program that I have written to download the ID card photos of students in my college given their URL. The URL of the images of all students is same, just we have to replace the ID numbers in the URL which I have provided from a notepad file "ID.txt". Following is the code that I have written-

from selenium import webdriver
driver=webdriver.Chrome(executable_path=r'C:\Users\user1712\Downloads\Chrome Downloads\chromedriver_win32\chromedriver.exe')
driver.get('https://swd.bits-goa.ac.in/student_pagetemp1?PHPSESSID=ecm2utnjvml8kpkpp8dh2dvnq0')


# ID.txt contains id card numbers of students. Each ID in a separate row 
filename = 'ID.txt'


with open(filename) as f:
    data = f.readlines()

import csv
import urllib.request


reader = csv.reader(data)
for row in reader:
    # url of each student is almost same. Only thing is that we have to change the ID in the url to get the image address of a student
    url="https://swd.bits-goa.ac.in/css/studentImg/"+str(row)+".jpg"
    fullname=str(row)+".jpg"
    urllib.request.urlretrieve(url, fullname)

Following is the error that I am getting-

Traceback (most recent call last):
  File "C:\Users\KAUSTUBH\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 1318, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "C:\Users\KAUSTUBH\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 1239, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Users\KAUSTUBH\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 1285, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "C:\Users\KAUSTUBH\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 1234, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\Users\KAUSTUBH\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 1026, in _send_output
    self.send(msg)
  File "C:\Users\KAUSTUBH\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 964, in send
    self.connect()
  File "C:\Users\KAUSTUBH\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 1400, in connect
    server_hostname=server_hostname)
  File "C:\Users\KAUSTUBH\AppData\Local\Programs\Python\Python36-32\lib\ssl.py", line 407, in wrap_socket
    _context=self, _session=session)
  File "C:\Users\KAUSTUBH\AppData\Local\Programs\Python\Python36-32\lib\ssl.py", line 814, in __init__
    self.do_handshake()
  File "C:\Users\KAUSTUBH\AppData\Local\Programs\Python\Python36-32\lib\ssl.py", line 1068, in do_handshake
    self._sslobj.do_handshake()
  File "C:\Users\KAUSTUBH\AppData\Local\Programs\Python\Python36-32\lib\ssl.py", line 689, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\KAUSTUBH\Downloads\Web scraping\swd trial.py", line 19, in <module>
    urllib.request.urlretrieve(url, fullname)
  File "C:\Users\KAUSTUBH\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 248, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "C:\Users\KAUSTUBH\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\KAUSTUBH\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 526, in open
    response = self._open(req, data)
  File "C:\Users\KAUSTUBH\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 544, in _open
    '_open', req)
  File "C:\Users\KAUSTUBH\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 504, in _call_chain
    result = func(*args)
  File "C:\Users\KAUSTUBH\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 1361, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "C:\Users\KAUSTUBH\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 1320, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)>
2
  • Its because of the ssl certificate Commented Jan 30, 2018 at 19:31
  • It looks like the website's certificate is messed up: see here Commented Jan 30, 2018 at 19:35

1 Answer 1

1

In order to skip the SSL error, you need to add an option --ignore-certificate-errors when you init the chromedriver.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--ignore-certificate-errors")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('https://swd.bits-goa.ac.in/student_pagetemp1?PHPSESSID=ecm2utnjvml8kpkpp8dh2dvnq0')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.