0

I want to extract the translation of a word in online dictionary. For example, the html code for 'car':

<ol class="sense_list level_1">
     <li class="sense_list_item level_1" value="1"><span class="def">any vehicle on wheels</span></li>

How can I extract "any vehicle on wheels" in Python with beautifulsoup or any other modules?

4
  • thanks for all answers. but that html code has other lines similar to above line with only one difference: value="1" .for each line value changes. how can i extract line with value="1" Commented Mar 24, 2015 at 17:49
  • Seems like a duplicate of stackoverflow.com/questions/328356/… Commented Mar 24, 2015 at 18:14
  • I have modified my answer to include start tag and attributes. Commented Mar 24, 2015 at 18:32
  • @SaraSantana updated the answer - the last option checks for the value attribute value. Commented Mar 24, 2015 at 20:03

3 Answers 3

1

I solve it by beautifulsoup:

soup = bs4.BeautifulSoup(html)
q1=soup.find('li', class_="sense_list_item level_1",value='1').text
Sign up to request clarification or add additional context in comments.

Comments

1

There are multiple ways to reach the desired element.

Probably the simplest would be to find it by class:

soup.find('span', class_='def').text

or, with a CSS selector:

soup.select('span.def')[0].text

or, additionally checking the parents:

soup.select('ol.level_1 > li.level_1 > span.def')[0].text

or:

soup.select('ol.level_1 > li[value=1] > span.def')[0].text

Comments

0

Assuming that is the only HTML code given, you can use NLTK.

import nltk 

#load html chunk into variable htmlstring#
extract = nltk.clean_html(htmlstring)
print(extract)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.