1

I have this kind of Html list:

lista = """
<ul>
<li>Arts &amp; Entertainment
    <ul>
      <li>Celebrities &amp; Entertainment News</li>
      <li>Comics &amp; Animation
        <ul>
        <li>Anime &amp; Manga</li>
        <li>Cartoons</li>
        <li>Comics</li>
        </ul>
      </li>
    </ul>
</li>
</ul>

"""

and I would like to convert it into a useful python structure for further processing:

what structure do you suggest? and also how would you do that?

2
  • Which html parser are you using? Commented Feb 12, 2012 at 13:34
  • As @jcollado suggested I'm using Beautiful Soup. Commented Feb 12, 2012 at 17:35

2 Answers 2

2

With BeautifulSoup, I'd do something like this:

from BeautifulSoup import BeautifulSoup
from pprint import pprint

def parseList(tag):
    if tag.name == 'ul':
        return [parseList(item)
                for item in tag.findAll('li', recursive=False)]
    elif tag.name == 'li':
        if tag.ul is None:
            return tag.text
        else:
            return (tag.contents[0].string.strip(), parseList(tag.ul))

soup = BeautifulSoup(lista)
pprint(parseList(soup.ul))

Example output:

[(u'Arts &amp; Entertainment',
  [u'Celebrities &amp; Entertainment News',
   (u'Comics &amp; Animation',
    [u'Anime &amp; Manga', u'Cartoons', u'Comics'])])]

Note that for list items that contain an unnumbered list, a tuple is returned in which the first element is the string in the list item and the second element is a list with the contents of the unnumbered list.

Sign up to request clarification or add additional context in comments.

1 Comment

I have to confess I'm having some troubles processing it. I'd like to print the whole list maintaining the hierarchy. For example printing it with different indents...
0

You can use the Mapping Type: Dictionaries

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.