Copying nested html lists in python?

Question

I'm a beginner programmer so this is probably a trivial question: I have a .html file with a deeply nested unordered list. How can I copy for example the first 4 nesting levels into a new empty .html file in Python? Do I need BeautifulSoup for this? For better illustration here is the code for the display effect in Javascript:

function nestless(root, selector, level) {
    var use = root;
    for (var i = 0; i <= level; i++) {
        use += ' ' + selector;
    }
    $(use).remove();
}

Here I would use:

nestless('#root', 'ul', 4);

It seems that my original question is badly written and difficult to parse, I'm sorry for that. The .html files are not really websites, but rather manually written text documents in a html editor and saved in .html. They contain nothing that couldn't be written with a LaTeX editor.

For example if I wanted to reduce this list list to the first 2 levels:

A
B
- C
- D
  - E
  - F
G

to

A
B
- C
- D
G

From my own research there are .html parsers via CSS selectors in BeautifulSoup+soupselect, PyQuery or lxml, but I'm not sure what's the easiest way to proceed or where to start reading.

Sorry i can't get your get question properly. BeautifulSoup do the parsing for xml codes. — Paritosh Singh
– Paritosh Singh, Commented Jul 20, 2012 at 15:55
(1) can we see some of the page structure, especially how the lists are nested? Do non-leaf nodes contain anything in addition to the sub-list? (2) what is it you want back - a nested list of limited depth, or a flat list? — Hugh Bothwell
– Hugh Bothwell, Commented Jul 20, 2012 at 16:09
The lists are standard <ul> nested lists, in the form of <ul> <li>A</li> <li>B</li> <ul> <li>C</li> <li>D<br> </li> </ul> </ul> — Elip
– Elip, Commented Jul 20, 2012 at 16:33

sean · Accepted Answer · 2012-07-20 15:54:11Z

1

I would look at Mechanize http://wwwsearch.sourceforge.net/mechanize/ to do the html parsing to get to the actual list itself. Try not to use Regex for this as it will become very messy and just make things more difficult.

answered Jul 20, 2012 at 15:54

sean

3,98523 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Bite code · Accepted Answer · 2012-07-21 20:42:15Z

0

You don't need beautifulsoup, but doing it without it would be a pain.

Use it to:

find your first level list tag;
iterate on the first level;
for each element, iterate to the second level;
do the same for the third et fourth level.
At the fourth level, iterate, deleting any child node.

Keep the object you have in memeroy, and just insert it in the next html object as a child when you generate the new html file.

answered Jul 21, 2012 at 20:42

Bite code

601k118 gold badges310 silver badges336 bronze badges

Collectives™ on Stack Overflow

Copying nested html lists in python?

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related