Run python script and output to txt for a list of URLs

Question

I have a python script for scraping some URLs. The URLs are in a list in a txt file.

The python script (only relevant parts) are as follows:

import urllib2
from bs4 import BeautifulSoup
quote_page = 'https://www.example.com/post/1245'

# rest of the code is here

print quote_page
print url
print title
print description
print actors
print director

I would like to run this script for multiple URLs in a txt file and output to a single txt file.

Any ideas how I can run this for my URLs in txt file?

Anaksunaman · Accepted Answer · 2019-03-29 01:36:06Z

You will likely want to use the Python with statement (introduced in PEP 343) and the built-in open() function:

# Python 2
import urllib2
import BeautifulSoup

# Python 3
# import urllib3
# from bs4 import BeautifulSoup

# Python 2.6+ and Python 3
with open('urls.txt','r') as url_file, open('output.txt', 'w') as output_file:

    url_list = url_file.readlines()

    for url_item in url_list:

        # quote_page = 'https://www.example.com/post/1245'
        quote_page = url_item

        # rest of the code is here

        # Python 2 and 3
        output_file.write(quote_page)
        output_file.write(url)
        output_file.write(title)
        output_file.write(description)
        output_file.write(actors)
        output_file.write(director)
        output_file.write('\n')

In this instance, we:

open() file handles (url_file,output_file) to our input and output text files ('urls.txt','output.txt') at the same time (using 'r' for reading and 'w' for writing, respectively).
Use the with statement to close these files automatically after we are done fully processing our URLs. Normally, we would need to issue separate e.g. url_file.close() and output_file.close() commands (ex. at Step 5).
Put our URLs into a list (url_list = url_file.readlines()).
Loop through our URL list and write() the data we want to our output_file.
close() both of our files automatically (see Step 2).

Note that to simply add data to an existing output_file, you will probably wish to use 'a' (append mode) rather than 'w' (write mode). So e.g. open('output.txt', 'w') as output_file would become open('output.txt', 'a') as output_file. This is important because 'w' (write mode) will truncate the file if the file already exists (i.e. you will lose your original data).

Collectives™ on Stack Overflow

Run python script and output to txt for a list of URLs

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related