0

I need to convert .docx file content to HTML text in order to display in web ui.

I've used Apache POI's XWPFDocument class but haven't been able to get any results yet; getting empty string. My code is based on this sample.

Here's also my code:

public JSONObject uploadDocxFile(MultipartFile multipartFile) throws Exception {
        InputStream inputStream = multipartFile.getInputStream();
        XWPFDocument wordDocument = new XWPFDocument(inputStream);

        WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
        org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        DOMSource domSource = new DOMSource(htmlDocument);
        StringWriter stringWriter = new StringWriter();

        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer serializer = tf.newTransformer();
        serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        serializer.setOutputProperty(OutputKeys.INDENT, "yes");
        serializer.setOutputProperty(OutputKeys.METHOD, "html");
        serializer.transform(domSource, new StreamResult(stringWriter));
        out.close();

        String result = new String(out.toByteArray());
        String htmlText = result;

        JSONObject jsonObject = new JSONObject();
        jsonObject.put("content", htmlText);
        jsonObject.put("success", true);
        return jsonObject;
    }
2
  • possible duplicate of Converting a .docx to html using Apache POI and getting no text Commented Jan 22, 2013 at 14:49
  • there's no proper answer at there.. owner of question opened that question with the same reason with me; but he added a comment that he has no problem while getting text. Commented Jan 22, 2013 at 19:23

3 Answers 3

1

even if it's too late I think that the previous code can be modified in this way (it works with word97 document)

    private static void convertWordDoc2HTML(File file)
    throws ParserConfigurationException, TransformerConfigurationException,TransformerException, IOException {       
    //change the type from XWPFDocument to HWPFDocument
    HWPFDocument hwpfDocument = null;
    try {
        FileInputStream fis = new FileInputStream(file);
        POIFSFileSystem fileSystem = new POIFSFileSystem(fis);          
             hwpfDocument = new HWPFDocument(fileSystem);

    } catch (IOException ex) {
        ex.printStackTrace();
    }

    WordToHtmlConverter wordToHtmlConverter = new   WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
    org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();
    //add processDocument method 
    wordToHtmlConverter.processDocument(hwpfDocument);
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    DOMSource domSource = new DOMSource(htmlDocument);
    StreamResult streamResult = new StreamResult(out);

    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer serializer = tf.newTransformer();
    serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    serializer.setOutputProperty(OutputKeys.INDENT, "yes");
    serializer.setOutputProperty(OutputKeys.METHOD, "html");
    serializer.transform(domSource, streamResult);
    out.close();

    String result = new String(out.toByteArray());

    String htmlText = result;
    System.out.println(htmlText);

    }

I hope it can be usefull.

Sign up to request clarification or add additional context in comments.

Comments

0

I am using docx4j to do this and it seems to be working. If you're using Maven you can just add the dependency (but use version 3.0.0) and then use one of the docx4j sample programs called ConvertOutHtml.java. Just change the filepath in ConvertOutHtml.java to point to your file and you should be fine.

Comments

0

Your code is generating an empty html output because you are not processing any document in the converter.

Anyway, if it is a docx you should be using XHTMLConverter to convert it to HTML instead of WordToHtmlConverter. See this answer

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.