I've trouble finding a library to convert simple HTML (with <b>, <i>, <p>, <li> ...) to a simple representation. Obviously this can't match HTML spec very far, but I don't need fancy things. For instance lynx is good for the task (except bold and italic are ignored and could probably be translated in some ANSI attributes):
$ echo "<b>hello</b> <p>this is a <i>list</i> <ul><li>foo</li><li>bar</li></ul></p>" |
lynx -stdin -dump
hello
this is a list
* foo
* bar
The ideal solution would be a python library. Otherwise I will stick to use lynx... So any command better than the one I've proposed here would also be accepted.