1

I am accessing an https page through a proxy:

    def read_page(self,url):
    '''
    Gets web page using proxy and returns beautifulsoup object
    '''
    soup = None
    try:
        r = requests.get(url, proxies=PROXIES, auth=PROXY_AUTH,
             cert = ('../static/crawlera-ca.crt'), verify=False,allow_redirects=False)
    except requests.exceptions.MissingSchema:
        return False

    if r.status_code == 200:
        soup = bs4.BeautifulSoup(r.text, "html.parser")
        if soup:
            return soup
    return False

I am passing "https://www.bestbuy.com" as the url. I get this error:

requests.exceptions.SSLError: HTTPSConnectionPool(host='www.bestbuy.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError(336265225, '[SSL] PEM lib (_ssl.c:2964)'),))

When I remove the cert = ('../static/crawlera-ca.crt') argument, the program accesses the site successfully giving me an 'InsecureRequestWarning', which is expected. But I don't understand why the other error happens. The certificate file is in the right place in my folder hierarchy, and was downloaded from the proxy service, so I know it's right.

The easy option would be to just not use the certificate and suppress the security warning, but I want to do it properly. Can anyone explain what is going on and how I can fix it?

1 Answer 1

2

I think you misunderstood the meaning of the cert parameter. This is not the (list of) trusted CA you seem to think but this parameter is for the client certificate you use to authenticate yourself against the server. And, such a certificate for authentication also requires a matching private key.

Given that it works without this parameter the server obviously does not need a client certificate from you (which is uncommon anyway). You've probably meant instead to use ../static/crawlera-ca.crt as the list of trusted CA for certificate validation instead. In this case you should not use the cert parameter but use the verify parameter like this:

  r = requests.get(url, proxies=PROXIES, auth=PROXY_AUTH,
         verify = '../static/crawlera-ca.crt', 
         allow_redirects=False)

For more information see the documentation of cert parameter and how to use it in authentication with client certificates and how to use verify in server certificate validation.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.