Python: Download CSV file, check return code?

Question

I am downloading multiple CSV files from a website using Python. I would like to be able to check the response code on each request.

I know how to download the file using wget, but not how to check the response code:

os.system('wget http://example.com/test.csv')

I've seen a lot of people suggesting using requests, but I'm not sure that's quite right for my use case of saving CSV files.

r = request.get('http://example.com/test.csv')
r.status_code # 200
# Pipe response into a CSV file... hm, seems messy?

What's the neatest way to do this?

Check stackoverflow.com/questions/2467609/using-wget-via-python and perhaps docs.python.org/2/library/urllib.html#urllib.FancyURLopener — Tim
– Tim, Commented Mar 9, 2015 at 23:30
I don't see anything particular wrong with the requests approach: you could alternatively use urllib.urlretrieve and check the header returned after — Jon Clements
– Jon Clements, Commented Mar 9, 2015 at 23:31
The natural question that arises from your posting: what do you want to do if the status_code is not 200? Do you want to throw the (partial/corrupt) data away? Move the suspect files into a different directory, write the URLs for those into some sort of error log? What you do with the status is a policy decision but guides the structure of the code around it. — Jim Dennis
– Jim Dennis, Commented Mar 9, 2015 at 23:44
@JimDennis thanks for this. I'm writing a script that will let people download a lot of data, and I need it to warn them if any of the data is in any way corrupt or incomplete. So I guess the answer is "print a warning and move the file". — Richard
– Richard, Commented Mar 17, 2015 at 10:04
I would recommend that you open the file via a temporary name (use the tempfile module's NamedTemporaryFile() static method) then then rename it only if the transfer is successful. If there's an older version of the file present I'd use a "link dance" to hard link it to a ".old" or ".$(date ...)" name, then hard link the old name to the temporary file (then unlinking the temp. file leaving only the good file). Using this process will provide the best data integrity guarantees. — Jim Dennis
– Jim Dennis, Commented Mar 17, 2015 at 19:20

Maciej Gol · Accepted Answer · 2015-03-09 23:30:33Z

0

You can use the stream argument - along with iter_content() it's possible to stream the response contents right into a file (docs):

import requests

r = None
try:
    r = requests.get('http://example.com/test.csv', stream=True)
    with open('test.csv', 'w') as f:
        for data in r.iter_content():
            f.write(data)

finally:
    if r is not None:
        r.close()

answered Mar 9, 2015 at 23:30

Maciej Gol

15.9k4 gold badges35 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Tim Over a year ago

I think this is basically what OP means by Pipe response into a CSV file

Maciej Gol Over a year ago

@TimCastelijns, yeah, the status code part has already been covered by the OP

Tim Over a year ago

I mean I'm pretty sure he's looking for a way that doesn't involve manually storing the result in a CSV with python code

Maciej Gol Over a year ago

@TimCastelijns, if by manually you mean there is no simple one-liner for this, just create a utils function that does exactly that. Other than that, I think it's perfectly fine to download the file from inside Python.

Tim Over a year ago

I also think that's fine - don't get me wrong. I just think that OP knows he can do it like this, but doesn't want to because it seems messy to him

|

Collectives™ on Stack Overflow

Python: Download CSV file, check return code?

1 Answer 1

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related