0

I am using Python 3.3 on Windows. I am trying to figure out how to download a .csv file from yahoo finance. It is a file for the Historical Prices.

This is the source code where the link is I'm trying to access.

<p>  
 <a href="http://ichart.finance.yahoo.com/table.csv?s=AAPL&amp;d=1&amp;e=1&amp;f=2014&amp;g=d&amp;a=8&amp;b=7&amp;c=1984&amp;ignore=.csv">
<img src="http://l.yimg.com/a/i/us/fi/02rd/spread.gif" width="16" height="16" alt="" border="0">
<strong>Download to Spreadsheet</strong>
 </a>
</p> 

And here is the code I wrote to do it.

from urllib.request import urlopen
from bs4 import BeautifulSoup

website = "http://ichart.finance.yahoo.com/table.csv?s=AAPL&amp;d=1&amp;e=1&amp;f=2014&amp;g=d&amp;a=8&amp;b=7&amp;c=1984&amp;ignore=.csv"
html = urlopen(website)
soup = BeautifulSoup(html)

When I ran the code, I was expecting it to start the download and put it into my downloads folder, but it doesn't do anything. It runs and then stops. No csv file shows up in my downloads. So I think I'm missing something else in this code.

3
  • The only thing you do is read the URL, parse it with BeautifulSoup and then end without doing anything else. How should Python know that you want to save the url? If you want to have the file in your downloads folder, you need to tell Python to do that. Commented Feb 1, 2014 at 17:09
  • I figured that was going on. What line(s) of code would accomplish that? Commented Feb 1, 2014 at 18:01
  • For example: How to download a file using Python? Commented Feb 1, 2014 at 19:06

2 Answers 2

2

You can do this with just urllib. The following code downloads the .csv file and puts the contents into a string named 'csv'. Then it saves the string to a file:

from urllib import request

# Retrieve the webpage as a string
response = request.urlopen("http://ichart.finance.yahoo.com/table.csv?s=AAPL&amp;d=1&amp;e=1&amp;f=2014&amp;g=d&amp;a=8&amp;b=7&amp;c=1984&amp;ignore=.csv")
csv = response.read()

# Save the string to a file
csvstr = str(csv).strip("b'")

lines = csvstr.split("\\n")
f = open("historical.csv", "w")
for line in lines:
   f.write(line + "\n")
f.close()
Sign up to request clarification or add additional context in comments.

7 Comments

It made the .csv file, but didn't write the lines in properly.
I updated the save code. The output file should be in complete csv format now.
Thanks! This worked. What does the .strip("b'") mean?
The response.read() command returns an object of type <bytes> rather than a string. str(csv) converts it to a string, but leaves the letter b and some quotes as artifacts of the conversions, i.e. b'XXXXXX' strip("b'") removes them to clean up the data. There is probably a cleaner way to do that conversion without the artifacts.
There certainly is a cleaner way to decode bytes to a unicode string; you'd use the bytes.decode( method. But since you are saving the whole thing to a file anyway you just open the file in binary mode and write the response to it directly: open('historical.csv', 'wb').write(response.read()).
|
0

since you already use BeautifulSoup and urllib:

url = BeautifulSoup(html).find('a')['href']
urllib.urlretrieve(url, '/path/to/downloads/file.csv')

1 Comment

Could you elaborate on this? I added these two lines with the path being, 'C:\Users\David\Downloads' The name would change unless I clear the download folder every run, because it will save it as table, then table(1), then table(2). And so on if I run it multiple times.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.