10

I tried converting .doc to HTML by using WordToHtmlConverter and it worked perfectly.

But when i tried to convert .docx to HTML, i got stuck with it.

What i tried:

I used the below code to convert .docx to HTML:

The code which i tried from : How to use Tika's XWPFWordExtractorDecorator class?

        InputStream input = TikaInputStream.get(new File("C:\\Users\\Downloads\\filename.docx"));


        Parser parser = new AutoDetectParser();


        StringWriter sw = new StringWriter();
        SAXTransformerFactory factory = (SAXTransformerFactory)
                 SAXTransformerFactory.newInstance();
        TransformerHandler handler = factory.newTransformerHandler();
        handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "html");
        handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
        handler.setResult(new StreamResult(sw));


        try {
            Metadata metadata = new Metadata();
            parser.parse(input, handler, metadata, new ParseContext());
            String xml = sw.toString();
            System.out.print("tika : "+xml); 
        } finally {
            input.close();
        }

The output what i got is,

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title/>
</head>
<body/>
</html>
  • Please explain where i gone wrong?
  • Is there any better way to convert .docx to html string

Appreciate your help, Thanks

7
  • According to the documentation poi.apache.org/apidocs/org/apache/poi/hwpf/converter/… this API is meant to be used up to Word 2007 when there were only .doc . So it won't work for .docx with this API. Try so save your document in .doc Commented Jul 9, 2014 at 11:57
  • @singe31 you dint get my point. I have converted .doc to html by using hwpf converter. But im trying to do it for .docx, is there a way? Commented Jul 9, 2014 at 12:02
  • 1
    code.google.com/p/xdocreport/wiki/XWPFConverterXHTML Commented Jul 9, 2014 at 12:05
  • At their simplest .docx files are an archive (you can open them with something like 7zip and view the contents) containing a bunch of XML files. With that in mind, you'd want to use something that can transform the XML into HTML. Commented Jul 9, 2014 at 12:08
  • You could also take a look on Pandoc or any other command line tool from Java. These tasks are not that trivial and I'm not sure if there's a a working API out there for that other than POI ATM. Commented Jul 9, 2014 at 12:24

2 Answers 2

10

This code worked for me to convert .docx to html:

You can also look at the link : Link to code

       //convert .docx to HTML string
        InputStream in= new FileInputStream(new File(path));
        XWPFDocument document = new XWPFDocument(in);


        XHTMLOptions options = XHTMLOptions.create().URIResolver(new FileURIResolver(new File("word/media")));

        OutputStream out = new ByteArrayOutputStream();


        XHTMLConverter.getInstance().convert(document, out, options);
        String html=out.toString();
        System.out.println(html);
Sign up to request clarification or add additional context in comments.

5 Comments

Could anyone provide an updated example please? The reference does not work anymore. Thanks.
I was getting problem using this code, as I was not able to get the jar for XHTMLOptions, XHTMLConverter and FileURIResolver and then when I searched I got these jars here "org.apache.poi.xwpf.converter.core-1.0.6.jar", "org.apache.poi.xwpf.converter.xhtml-1.0.6.jar" and "ooxml-schemas-1.1.jar", if you use these jars you will not get any kind of error with the above code
@Vipul here you have dependency with it mvnrepository.com/artifact/fr.opensagres.xdocreport/…
I have followed the above code it's converting docx to html. But i didn't get border styles which are applied in my docx!!. Any idea??????
Images are not working in my case. Is there a fix for it?
2

You may want to make use of Mammoth docx to HTML library.Its a library for displaying doc, docx documents by converting them to html on the browser side as well as can be handled on the backend.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.