I am currently trying to screen scrape a website to put info into a dictionary. I am using urllib2 and BeautifulSoup. I cannot figure out how to parse the web pages source info to get what I want and to read it into a dictionary. The info I want is displayed as <title>Nov 24 | 8:00AM | Sole In. Peace Out. </title> in the source code. I am thinking of using a reg expression to read in the line, convert the time and date to a datetime, and then parse the line to read the data into a dictionary. The dictionary output should be something along the lines of
[
{
"date": dateime(2010, 11, 24, 23, 59),
"title": "Sole In. Peace Out.",
}
]
Current Code:
from BeautifulSoup import BeautifulSoup
import re
import urllib2
url = 'http://events.cmich.edu/RssStudentEvents.aspx'
response = urllib2.urlopen(url)
html = response.read()
soup = BeautifulSoup(html)
Sorry for the wall of text, and thank you for your time and help!