1

So I am trying to work with python 2.7 to do various things that require pulling data from the internet. I have not been very successful, and I am looking for help to diagnose what I am doing wrong.

Firstly I managed to get pip to work by by defining the proxy like so, pip install --proxy=http://username:[email protected]:8080 numpy. Hence python must be capable of getting through it!

However when it came to actually writing a .py script that could do the same I have had no success. I tried using the following code with urllib2 first:

import urllib2

uri = "http://www.python.org"
http_proxy_server = "someproxyserver.com"
http_proxy_port = "8080"
http_proxy_realm = http_proxy_server
http_proxy_user = "username"
http_proxy_passwd = "password"

# Next line = "http://username:[email protected]:8080"
http_proxy_full_auth_string = "http://%s:%s@%s:%s" % (http_proxy_user,
                                                      http_proxy_passwd,
                                                      http_proxy_server,
                                                      http_proxy_port)

def open_url_no_proxy():
    urllib2.urlopen(uri)

    print "Apparent success without proxy server!"    

def open_url_installed_opener():
    proxy_handler = urllib2.ProxyHandler({"http": http_proxy_full_auth_string})

    opener = urllib2.build_opener(proxy_handler)
    urllib2.install_opener(opener)
    urllib2.urlopen(uri)

    print "Apparent success through proxy server!"

if __name__ == "__main__":
    open_url_no_proxy()
    open_url_installed_opener()

However I just get this error:

URLError: <urlopen error [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond>

Then I tried urllib3 as this is the module used by pip to handle proxies:

from urllib3 import ProxyManager, make_headers

# Establish the Authentication Settings
default_headers = make_headers(basic_auth='username:password')
http = ProxyManager("https://www.proxy.com:8080/", headers=default_headers)

# Now you can use `http` as you would a normal PoolManager
r = http.request('GET', 'https://www.python.org/')

# Check data is from destination
print(r.data)

I got this error:

raise MaxRetryError(_pool, url, error or ResponseError(cause)) MaxRetryError: HTTPSConnectionPool(host='www.python.org', port=443): Max retries exceeded with url: / (Caused by ProxyError('Cannot connect to proxy.', error('Tunnel connection failed: 407 Proxy Authorization Required',)))

I would really appreciate any help diagnosing this issue.

7
  • Is your proxy on https:// or http://? In the pip example it's http://, but urllib3 example it's https://. Commented Jul 2, 2015 at 8:43
  • If that doesn't work, you could try using Requests (built on urllib3, also used by pip): docs.python-requests.org/en/latest/user/advanced/… Commented Jul 2, 2015 at 8:47
  • Yeh I have played around with the http vs https, actually when I have it set to http using urllib3 it doesn't have any errors, however it returns a page which tells me that the proxy requires authentication. Commented Jul 2, 2015 at 8:50
  • I am tried a script with request, and I was getting similar errors. I am starting to think its got something to do with the authentication details in giving it. Commented Jul 2, 2015 at 8:51
  • Could be. It's strange that pip works. Are you certain that pip is actually hitting the proxy and not ignoring it somehow? You could use something like tcpdump/ngrep to monitor traffic and see what it's actually doing. E.g. stackoverflow.com/questions/9241391/… Commented Jul 2, 2015 at 15:17

1 Answer 1

2

The solution to my problem was to use the requests module, see the below thread: Proxies with Python 'Requests' module

mtt2p list this code which worked for me.

import requests
import time
class BaseCheck():
    def __init__(self, url):
        self.http_proxy  = "http://user:pw@proxy:8080"
        self.https_proxy = "http://user:pw@proxy:8080"
        self.ftp_proxy   = "http://user:pw@proxy:8080"
        self.proxyDict = {
                      "http"  : self.http_proxy,
                      "https" : self.https_proxy,
                      "ftp"   : self.ftp_proxy
                    }
        self.url = url
        def makearr(tsteps):
            global stemps
            global steps
            stemps = {}
            for step in tsteps:
                stemps[step] = { 'start': 0, 'end': 0 }
            steps = tsteps
        makearr(['init','check'])
        def starttime(typ = ""):
            for stemp in stemps:
                if typ == "":
                    stemps[stemp]['start'] = time.time()
                else:
                    stemps[stemp][typ] = time.time()
        starttime()
    def __str__(self):
        return str(self.url)
    def getrequests(self):
        g=requests.get(self.url,proxies=self.proxyDict)
        print g.status_code
        print g.content
        print self.url
        stemps['init']['end'] = time.time()
        #print stemps['init']['end'] - stemps['init']['start']
        x= stemps['init']['end'] - stemps['init']['start']
        print x


test=BaseCheck(url='http://google.com')
test.getrequests()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.