Verify URL exists from file

Question

So I have some code that I use to scrape through my mailbox looking for certain URL's. Once this is completed it creates a file called links.txt

I want to run a script against that file to get an output of all the current URL's that are live in that list. The script I have only allows for me to check on URL at a time

import urllib2

for url in ["www.google.com"]:

    try:
        connection = urllib2.urlopen(url)
        print connection.getcode()
        connection.close()
    except urllib2.HTTPError, e:
        print e.getcode()

So, you just need to know how to read lines from a text file? — Blorgbeard
– Blorgbeard, Commented Aug 13, 2012 at 21:29

Brenden Brown · Accepted Answer · 2012-08-13 21:29:50Z

4

Use requests:

import requests

with open(filename) as f:
    good_links = []
    for link in file:
        try:
            r = requests.get(link.strip())
        except Exception:
            continue
        good_links.append(r.url) #resolves redirects

You can also consider extracting the call to requests.get into a helper function:

def make_request(method, url, **kwargs):
    for i in range(10):
        try:
            r = requests.request(method, url, **kwargs)
            return r
        except requests.ConnectionError as e:
            print e.message
        except requests.HTTPError as e:
            print e.message
        except requests.RequestException as e:
            print e.message
    raise Exception("requests did not succeed")

answered Aug 13, 2012 at 21:29

Brenden Brown

3,2451 gold badge16 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

kindall · Accepted Answer · 2012-08-13 21:35:57Z

1

It is trivial to make this change, given that you're already iterating over a list of URLs:

import urllib2

for url in open("urllist.txt"):   # change 1

    try:
        connection = urllib2.urlopen(url.rstrip())   # change 2
        print connection.getcode()
        connection.close()
    except urllib2.HTTPError, e:
        print e.getcode()

Iterating over a file returns the lines of the file (complete with line endings). We use rstrip() on the URL to strip off the line endings.

There are other improvements you can make. For example, some will suggest you use with to make sure your file is closed. This is good practice but probably not necessary in this script.

answered Aug 13, 2012 at 21:35

kindall

185k36 gold badges291 silver badges321 bronze badges

4 Comments

user1596502 Over a year ago

#!/usr/bin/python import urllib2 for url in open("ZeuS_links"): # change 1 try: connection = urllib2.urlopen(url.rstrip()) # change 2 print connection.getcode() connection.close() except urllib2.HTTPError, e: print e.getcode()

user1596502 Over a year ago

Kindall thanks for the assistance the script is now working only thing I would like to find out now is instead of 200 or 401 just push out the actual urls I will continue to work with it and update as I go thanks again for your help

kindall Over a year ago

You already have url as the URL you're trying to retrieve; just print that.

user1596502 Over a year ago

that did it sometimes the small things are right in front of you. Thanks alot kindall. Awesome Awesome Awesome

Collectives™ on Stack Overflow

Verify URL exists from file

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related