python using requests with valid hostname

Question

Trying to use requests to download a list of urls and catch the exception if it is a bad url. Here's my test code:

import requests
from requests.exceptions import ConnectionError

#goodurl
url = "http://www.google.com"

#badurl with good host
#url = "http://www.google.com/thereisnothing.jpg"

#url with bad host
#url = "http://somethingpotato.com"    

print url
try:
    r = requests.get(url, allow_redirects=True)
    print "the url is good"
except ConnectionError,e:
    print e
    print "the url is bad"

The problem is if I pass in url = "http://www.google.com" everything works as it should and as expected since it is a good url.

http://www.google.com
the url is good

But if I pass in url = "http://www.google.com/thereisnothing.jpg"

I still get :

http://www.google.com/thereisnothing.jpg
the url is good

So its almost like its not even looking at anything after the "/"

just to see if the error checking is working at all I passed a bad hostname: #url = "http://somethingpotato.com"

Which kicked back the error message I expected:

http://somethingpotato.com
HTTPConnectionPool(host='somethingpotato.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1b6cd15b90>: Failed to establish a new connection: [Errno -2] Name or service not known',))
the url is bad

What am I missing to make request capture a bad url not just a bad hostname?

Thanks

cbolles · Accepted Answer · 2017-07-06 13:29:47Z

3

requests do not create a throwable exception at a 404 response. Instead you need to filter them out be checking to see if the status is 'ok' (HTTP response 200)

import requests
from requests.exceptions import ConnectionError

#goodurl
url = "http://www.google.com/nothing"

#badurl with good host
#url = "http://www.google.com/thereisnothing.jpg"

#url with bad host
#url = "http://somethingpotato.com"    

print url
try:
    r = requests.get(url, allow_redirects=True)
    if r.status_code == requests.codes.ok:
        print "the url is good"
    else:
        print "the url is bad"
except ConnectionError,e:
    print e
    print "the url is bad"

EDIT: import requests from requests.exceptions import ConnectionError

def printFailedUrl(url, response):
    if isinstance(response, ConnectionError):
        print "The url " + url + " failed to connect with the exception " + str(response)
    else:
        print "The url " + url + " produced the failed response code " + str(response.status_code)

def testUrl(url):
    try:
        r = requests.get(url, allow_redirects=True)
        if r.status_code == requests.codes.ok:
            print "the url is good"
        else:
            printFailedUrl(url, r)
    except ConnectionError,e:
        printFailedUrl(url, e)

def main():
    testUrl("http://www.google.com") #'Good' Url 
    testUrl("http://www.google.com/doesnotexist.jpg") #'Bad' Url with 404 response
    testUrl("http://sdjgb") #'Bad' url with inaccessable url

main()

In this case one function can handle both getting an exception or a request response passed into it. This way you can have separate responses for if the url returns some non 'good' (non-200) response vs an unusable url which throws an exception. Hope this has the information you need in it.

edited Jul 6, 2017 at 13:29

answered Jul 6, 2017 at 1:03

cbolles

5055 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

chowpay Over a year ago

I was hoping to pass in the actually error (e) since I want to write the error to a file. Perhaps instead of printing "url is bad" in the else statement I can print r.status_code (which should be bad if it made to the else statement). Would you suggest that?

cbolles Over a year ago

@chowpay The issue is that it is not actually an error so you wont be able to catch an error to print. I would suggest making a custom function to print out an "error" url which could include the url and the status code. If you would like an example I could add to my answer

cbolles Over a year ago

@chowpay see if the code I added at the bottom does what you need

cbolles Over a year ago

@chowpay No problem!

chowpay Over a year ago

question, for "isinstance(response,ConnectionError)" since you're feeding the function url and a response. how does "isinstance" know that your "response" is a status code error vs a ConnectionError?

|

Salmaan P · Accepted Answer · 2017-07-06 00:58:06Z

0

what you want is to check r.status_code. Getting r.status_code on "http://www.google.com/thereisnothing.jpg" will give you 404. you can put a condition for only 200 code URL to be "good".

answered Jul 6, 2017 at 0:58

Salmaan P

8772 gold badges13 silver badges36 bronze badges

Collectives™ on Stack Overflow

python using requests with valid hostname

2 Answers 2

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related