2

I've trouble finding a library to convert simple HTML (with <b>, <i>, <p>, <li> ...) to a simple representation. Obviously this can't match HTML spec very far, but I don't need fancy things. For instance lynx is good for the task (except bold and italic are ignored and could probably be translated in some ANSI attributes):

$ echo "<b>hello</b> <p>this is a <i>list</i> <ul><li>foo</li><li>bar</li></ul></p>" |
    lynx -stdin  -dump
hello

this is a list
  * foo
  * bar

The ideal solution would be a python library. Otherwise I will stick to use lynx... So any command better than the one I've proposed here would also be accepted.

1 Answer 1

1

There is html2text which is not quite what wanted, but could match some other viewers constraints.

It produces text from html. This text is following Markdown format. So there are no use of ANSI attributes for instance. However, as Markdown is meant to be a visual text-only format, it can satisfy probably some needs.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.