0

I am downloading multiple CSV files from a website using Python. I would like to be able to check the response code on each request.

I know how to download the file using wget, but not how to check the response code:

os.system('wget http://example.com/test.csv')

I've seen a lot of people suggesting using requests, but I'm not sure that's quite right for my use case of saving CSV files.

r = request.get('http://example.com/test.csv')
r.status_code # 200
# Pipe response into a CSV file... hm, seems messy?

What's the neatest way to do this?

5
  • Check stackoverflow.com/questions/2467609/using-wget-via-python and perhaps docs.python.org/2/library/urllib.html#urllib.FancyURLopener Commented Mar 9, 2015 at 23:30
  • I don't see anything particular wrong with the requests approach: you could alternatively use urllib.urlretrieve and check the header returned after Commented Mar 9, 2015 at 23:31
  • The natural question that arises from your posting: what do you want to do if the status_code is not 200? Do you want to throw the (partial/corrupt) data away? Move the suspect files into a different directory, write the URLs for those into some sort of error log? What you do with the status is a policy decision but guides the structure of the code around it. Commented Mar 9, 2015 at 23:44
  • @JimDennis thanks for this. I'm writing a script that will let people download a lot of data, and I need it to warn them if any of the data is in any way corrupt or incomplete. So I guess the answer is "print a warning and move the file". Commented Mar 17, 2015 at 10:04
  • I would recommend that you open the file via a temporary name (use the tempfile module's NamedTemporaryFile() static method) then then rename it only if the transfer is successful. If there's an older version of the file present I'd use a "link dance" to hard link it to a ".old" or ".$(date ...)" name, then hard link the old name to the temporary file (then unlinking the temp. file leaving only the good file). Using this process will provide the best data integrity guarantees. Commented Mar 17, 2015 at 19:20

1 Answer 1

0

You can use the stream argument - along with iter_content() it's possible to stream the response contents right into a file (docs):

import requests

r = None
try:
    r = requests.get('http://example.com/test.csv', stream=True)
    with open('test.csv', 'w') as f:
        for data in r.iter_content():
            f.write(data)

finally:
    if r is not None:
        r.close()
Sign up to request clarification or add additional context in comments.

7 Comments

I think this is basically what OP means by Pipe response into a CSV file
@TimCastelijns, yeah, the status code part has already been covered by the OP
I mean I'm pretty sure he's looking for a way that doesn't involve manually storing the result in a CSV with python code
@TimCastelijns, if by manually you mean there is no simple one-liner for this, just create a utils function that does exactly that. Other than that, I think it's perfectly fine to download the file from inside Python.
I also think that's fine - don't get me wrong. I just think that OP knows he can do it like this, but doesn't want to because it seems messy to him
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.