I'm using Python-Docx to export all the data from a 500-page Docx file into a spreadsheet using pandas. So far so good except that the process is removing all character styles. I have written the following to preserve superscript, but I can't seem to get it working.
for para in document.paragraphs:
content = para.text
for run in para.runs:
if run.font.superscript:
r.font.superscript = True
r = para.add_run(run.text)
scripture += r.text
My Input text might me, for example:
Genesis 1:1 1 In the beginning God created the heavens and the earth.
But my output into the Xlsx file is:
Genesis 1:1 1 In the beginning God created the heavens and the earth. (Still losing the superscript formatting).
How do I preserve the font.style of each run for export? Perhaps more specifically, how do I get the text formatting from each run to be encoded into the "scripture" string?
Any help is greatly appreciated!