1

I'm completely new to scraping the web but I really want to learn it in python. I have a basic understanding of python.

I'm having trouble understanding a code to scrape a webpage because I can't find a good documentation about the modules which the code uses.

The code scraps some movie's data of this webpage

I get stuck after the comment "selection in pattern follows the rules of CSS".

I would like to understand the logic behind that code or a good documentation to understand that modules. Is there any previous topic which I need to learn?

The code is the following :

import requests
from pattern import web
from BeautifulSoup import BeautifulSoup

url = 'http://www.imdb.com/search/title?sort=num_votes,desc&start=1&title_type=feature&year=1950,2012'
r = requests.get(url)
print r.url

url = 'http://www.imdb.com/search/title'
params = dict(sort='num_votes,desc', start=1, title_type='feature', year='1950,2012')
r = requests.get(url, params=params)
print r.url  # notice it constructs the full url for you

#selection in pattern follows the rules of CSS

dom = web.Element(r.text)
for movie in dom.by_tag('td.title'):    
    title = movie.by_tag('a')[0].content
    genres = movie.by_tag('span.genre')[0].by_tag('a')
    genres = [g.content for g in genres]
    runtime = movie.by_tag('span.runtime')[0].content
    rating = movie.by_tag('span.value')[0].content
    print title, genres, runtime, rating
0

1 Answer 1

1

Here's the documentation for BeautifulSoup, which is an HTML and XML parser.

The comment

selection in pattern follows the rules of CSS

means the strings such as 'td.title' and 'span.runtime' are CSS selectors that help find the data you are looking for, where td.title searches for the <TD> element with attribute class="title".

The code is iterating through the HTML elements in the webpage body and extracting title, genres, runtime, and rating by the CSS selectors .

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.