7

I'm trying to pull apart a word document that looks like this:

1.0 List item
1.1 List item
1.2 List item
2.0 List item

It is stored in docx, and I'm using python-docx to try to parse it. Unfortunately, it loses all the numbering at the start. I'm trying to identify the start of each ordered list item.

The python-docx library also allows me to access styles, but I cannot figure out how to determine whether the style is a list style or not.

So far I've been messing around with a function and checking output, but the standard format is something like:

    for p in doc.paragraphs:
        s = p.style
        while s.base_style is not None:
            print s.name
            s = s.base_style
        print s.name

Which I've been using to try to search up through the custom styles, but the all end at "Normal," as opposed to the "ListNumber."

I've tried searching styles under the document, the paragraphs, and the runs without luck. I've also tried searching p.text, but as previously mentioned the numbering does not persist.

2
  • Can you please post some of your code? It might help us understand the problem better. Commented Jun 1, 2015 at 21:28
  • I wasn't sure what to add- I've been modifying a function so I can try to perceive the underlying structure. Commented Jun 1, 2015 at 21:36

1 Answer 1

6

List items can be implemented in the XML in a variety of ways. Unfortunately the most common way, adding list items using the toolbar (as opposed to using styles) is also probably the most complex.

Best bet is to start using opc-diag to have a look at the XML that's being used inside the document.xml and then formulating a strategy from there.

The list-handling API for python-docx hasn't really been implemented yet, so you'll need to operate at the lxml level if you want to get this done with today's version.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for this informative answer. Since this was posted a year ago, I wonder... has there been any improvement in the ability of python-docx to handle lists? TIA

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.