My Java application is using Apache POI XWPF to parse an MS Word docx file.
The program iterates through each XWPFRun for each XWPFParagraph within an XWPFDocument in the following was
String fileName = "C:\\<yourFile>.docx";
try (XWPFDocument doc = new XWPFDocument(Files.newInputStream(Paths.get(fileName)))) {
List<XWPFParagraph> paragraphs = doc.getParagraphs();
for (XWPFParagraph paragraph : paragraphs) {
List<XWPFRun> runs = paragraph.getRuns();
for (XWPFRun run : runs) {
System.out.print(run.getText(0));
System.out.print("| name " + " : " + run.getFontName());
System.out.print("| size " + " : " + run.getFontSize());
System.out.print("| sizeD " + " : " + run.getFontSizeAsDouble());
System.out.print("| sizeC " + " : " + run.getComplexScriptFontSizeAsDouble());
}
}
} catch (IOException e) {
throw new RuntimeException(e);
}
Intermitently, an XWPFRun returns:
getFontSize() = -1;
getFontSizeAsDouble() = null;
getComplexScriptFontSizeAsDouble() = null;
Nevertheless getText(0), getFontName(), isBold() ... each return what can be seen in the document through a docx user client. Also, the client displays the fragment with FontSize = 12.
Im Using POI V 5.2.5.
The relevant docx tags are <sz> and <szCs>, which are subordinate to <rPr>, which is subordinate to <r>
The confusing issue is that, although in the cases where the font size is returned as -1 the <sz> and <szCs> tags are omitted, nevertheless the client displays the text with a particular size.
Not only is the text displayed, but with the size I expect given the structure of the document. The document is not structured by variation in heading type, but rather by variation in display characteristics (Font, Bolding, Italicization etc). The client APPEARS to find the last text fragment with identical display characteristics other than size, and then inherits size from that previous fragment. But Im guessing !
Please note that, as suggested in an another post, the value of document.getStyles().getDefaultRunStyle().getFontSize()) does not return the displayed font size.
Also note that the associated Style.xml has the following fragment
<w:style w:type="paragraph" w:default="1" w:styleId="Normal">
<w:name w:val="Normal"/>
<w:next w:val="Normal"/>
<w:pPr/>
<w:rPr>
<w:sz w:val="24"/>
This DOES contain the FontSize Im after, but the only useful thing that
doc.getStyles().getDefaultParagraphStyle()
returns is getSpacingAfter() . The API refers to overriding classes that I would hope will return 12 (ie 24/2") but I have no indication what they are