I know very little about Python. But I was trying to achieve something in Extract, Transform and Load (ETL) using a small Python scrip. I get the desired result, but still want to understand this script.
from bs4 import BeautifulSoup
import urllib
import re
import string
import csv
urlHandle = urllib.urlopen("http://finance.yahoo.com/q/cp?s=^DJI")
html = urlHandle.read()
soup = BeautifulSoup(html)
table = soup.find('table', attrs = {
'id': 'yfncsumtab'
})
rows = table.findAll('tr')
a = ''
csvfile = open("F:/data/yahoofinance.csv", 'w')
for tr in rows[5: ]:
for td in tr.find_all('td', attrs = {
'class': 'yfnc_tabledata1'
}):
a += '"' + td.get_text() + '",'
a += '\n'
csvfile.write(a)
a = '
My questions are in this code, soup is an object returned from BeautifulSoup(html) function. Am I right? So in next statement I guess table is also an object, so that means we are searching for a value in the soup object using the find function and that it's returning an object?
Please correct me on my information I have understood myself in the above code...
urlHandleis a class,urllibis what? andurlopenis a static method.htmlis an object,urlhandleis a class,readis a method.soupis an object,BeautifulSoup(html)is a function.
Please give your feedback on my understanding....and correct me where am wrong with your experienced words!