Continue python script even if urllib.request encounters HTTP error codes

Question

I have a python script that accepts a CSV file containing a list of URLs. The CSV file looks like this.

name, url
google, https://httpstat.us/200
yahoo, https://httpstat.us/401
bcs, https://httpstat.us/521

The python script starts from here.

import time
import urllib.request
import urllib.error

url_list = []
url_data = []
total_duration = 0.5
wait_time = 5
http200_count = 0
http300_count = 0
http400_count = 0
http500_count = 0
counter = 1


with open("url_list.csv", "r") as url_file:
    headers = next(url_file).strip().replace(" ","").split(",")

    for row in url_file:
        url_list = row.strip().replace(" ","").split(",")
        url_dict = dict(zip(headers, url_list), http200 = http200_count, http300 = http300_count, http400 = http400_count, http500 = http500_count)
        url_data.append(url_dict)

url_file.close()


interval = time.time() + 60 * total_duration
print(interval)

while 1:

    while 1:
        print(f"Counter: {counter}")

        for url_entry in url_data:
            print(f"Website:", url_entry["name"])
            print(f"URL:", url_entry["url"])
            print(urllib.request.urlopen(url_entry["url"]).status)
            
            try:
                with urllib.request.urlopen(url_entry["url"]) as response:
                    print(response.status)
                    request_status = response.status
                    # return response.read(), response
            except urllib.error.HTTPError as error:
                print(error.status, error.reason)
                continue
            except urllib.error.URLError as error:
                print(error.reason)
                continue
            else:
                print("All is good!")


            if request_status >= 200 and request_status <=299:
                url_entry["http200"] = url_entry["http200"] + 1
            elif request_status >= 300 and request_status <=399:
                url_entry["http300"] = url_entry["http300"] + 1
            elif request_status >= 400 and request_status <=499:
                url_entry["http400"] = url_entry["http400"] + 1
            elif request_status >= 500 and request_status <=599:
                url_entry["http500"] = url_entry["http500"] + 1
            print(f"", url_entry["name"], "-", url_entry["http200"], "-", url_entry["http300"], "-", url_entry["http400"], "-", url_entry["http500"])
            print("")
        
        time.sleep(wait_time)
        counter = counter + 1
    
        print(time.time(),"-",interval)
        if time.time() >= interval:
            print(f"There are {counter-1} probes in the past", 60 * total_duration, "secs.")
            print(f"HTTP Status code:", url_entry["name"], "-", url_entry["http200"], "-", url_entry["http300"], "-", url_entry["http400"], "-", url_entry["http500"])
            interval = time.time() + 60 * total_duration
            counter = 1

The problem is whenever it hits a URL with HTTP error codes it will exit and the while loop stops processing. Is there any way to continue the script even after hitting HTTP errors?

➜  DevOps_Practice python urlProbe.py
1687386788.8879528
Counter: 1
Website: google
URL: https://httpstat.us/200
200
200
All is good!
 google - 1 - 0 - 0 - 0

Website: yahoo
URL: https://httpstat.us/401
Traceback (most recent call last):
  File "/Users/desmondlim/Documents/DevOps/Projects/DevOps_Challenge_SPH/urlProbe.py", line 38, in <module>
    print(urllib.request.urlopen(url_entry["url"]).status)
  File "/Users/desmondlim/.pyenv/versions/3.10.4/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/Users/desmondlim/.pyenv/versions/3.10.4/lib/python3.10/urllib/request.py", line 525, in open
    response = meth(req, response)
  File "/Users/desmondlim/.pyenv/versions/3.10.4/lib/python3.10/urllib/request.py", line 634, in http_response
    response = self.parent.error(
  File "/Users/desmondlim/.pyenv/versions/3.10.4/lib/python3.10/urllib/request.py", line 563, in error
    return self._call_chain(*args)
  File "/Users/desmondlim/.pyenv/versions/3.10.4/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/Users/desmondlim/.pyenv/versions/3.10.4/lib/python3.10/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 401: Unauthorized

Thanks.

This line: print(urllib.request.urlopen(url_entry["url"]).status) is outside the try. Why do you even need that, since you're opening the same URL in the with statement? — Barmar
– Barmar, Commented Jun 21, 2023 at 23:00
Do you really need to use urllib? The requests module is much easier to use. — Barmar
– Barmar, Commented Jun 21, 2023 at 23:01
Barmar, I am just trying to view the HTTP status returned. And removing this line fix the issue. Thanks. — Ldd
– Ldd, Commented Jun 21, 2023 at 23:04

Sash Sinha · Accepted Answer · 2023-06-21 23:00:56Z

0

The problem is with this line:

print(urllib.request.urlopen(url_entry["url"]).status)

You're attempting to open the URL and print its status before entering the try-except block, and this will fail without being caught by the exceptions if the URL leads to an HTTP error. This is why your script stops.

To fix this, you should remove this line. Your try-except block already attempts to open the URL and print its status correctly.

answered Jun 21, 2023 at 23:00

Sash Sinha

22.8k4 gold badges27 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ldd Over a year ago

Thanks Sash for the explanation. Yes removing this line helps. Thanks.

Collectives™ on Stack Overflow

Continue python script even if urllib.request encounters HTTP error codes

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related