3

My HTML code contains nested lists like this:

<ul>
  <li>Apple</li>
  <li>Pear</li>
  <ul>
     <li>Cherry</li>
     <li>Orange</li>
     <ul>
        <li>Pineapple</li>
     </ul>
  </ul>
  <li>Banana</li>
</ul>

I need to parse them so they look like this:

+ Apple
+ Pear
++ Cherry
++ Orange
+++ Pineapple
+ Banana

I tried using BeautifulSoup, but I am stuck on how to consider the nesting in my code.

Example, where x contains the HTML code listed above:

import bs4

soup = bs4.BeautifulSoup(x, "html.parser")
for ul in soup.find_all("ul"):
    for li in ul.find_all("li"):
        li.replace_with("+ {}\n".format(li.text))

3 Answers 3

3

It's somewhat of a hack, but you can do it using lxml instead:

import lxml.html as lh

uls = """[your html above]"""
doc = lh.fromstring(uls)
tree = etree.ElementTree(doc)
for e in doc.iter('li'):
        path = tree.getpath(e)
        print('+' * path.count('ul'), e.text)

Output:

+ Apple
+ Pear
++ Cherry
++ Orange
+++ Pineapple
+ Banana
Sign up to request clarification or add additional context in comments.

Comments

2

You can use recursion:

import bs4, re
from bs4 import BeautifulSoup as soup
s = """
<ul>
  <li>Apple</li>
  <li>Pear</li>
  <ul>
     <li>Cherry</li>
     <li>Orange</li>
     <ul>
        <li>Pineapple</li>
     </ul>
  </ul>
  <li>Banana</li>
</ul>
"""
def indent(d, c = 0):
   if (s:=''.join(i for i in d.contents if isinstance(i, bs4.NavigableString) and i.strip())):
       yield f'{"+"*c} {s}'
   for i in d.contents:
      if not isinstance(i, bs4.NavigableString):
         yield from indent(i, c+1)

print('\n'.join(indent(soup(s, 'html.parser').ul)))

Output:

+ Apple
+ Pear
++ Cherry
++ Orange
+++ Pineapple
+ Banana

Comments

1

I think it would be easier to convert the html string to markdown with custom bullets. This can be done with markdownify:

import markdownify

formatted_html = markdownify.markdownify(x, bullets=['+', '++', '+++'], strip="ul")

result:

+ Apple
+ Pear
++ Cherry
++ Orange
+++ Pineapple
+ Banana

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.