1

Is it possible to convert from MS office file formats using Apache PDFBox (the documentation isn't clear about this, and the javadoc seems to indicate no such capability exists), or would I need to do some tedious conversions with Apache POI?

The reason I'm asking is the answer to this StackOverflow question:

https://stackoverflow.com/questions/10861227/convert-ms-office-to-pdf-in-java

I imagine I'll need to use Apache POI, but I wanted to clarify.

2 Answers 2

1

In order to do this conversion, you will need MS Office, or perhaps Google Drive. PDFBox does not convert from anything to PDF or vice versa -- it simply reads and writes PDF files. Apache POI will not do that type of conversion either -- it simply reads and writes MS Office files. Specifically, it does not render them. You could implement a rendering engine for each type of Office file yourself, but that would be a gargantuan task to say the least.

Sign up to request clarification or add additional context in comments.

1 Comment

You might find LibreOffice/OpenOffice under JODConverter is good enough for your purposes. You might even find docx4j (a pure Java solution) is good enough. Although it handles pptx and xlsx as well, it only does PDF output out of the box for docx.
0

Take a look at https://angelozerr.wordpress.com/2012/12/06/how-to-convert-docxodt-to-pdfhtml-with-java/.

One of possible options it mentions is XWPFConverterPDFViaIText:

org.apache.poi.xwpf.converter.pdf provides the DOCX 2 Pdf converter based on Apache POI XWPF and iText.

You can test this converter with the REST Converter service http://xdocreport-converter.opensagres.cloudbees.net/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.