2

I'm trying to retrieve a docx from database and try to process it by checking its content. I think mycode retrieved my desired file but it seems that I haven't fully understood APACHE POI. I got an error at my stacktrace saying that I the wrong POI any ideas?

Here's how I load the file:

public void loadFile(String FileName)
{
    InputStream is = null;
    try
    {
        //Connecting to MYSQL Database
        Class.forName(driver).newInstance();
        con = DriverManager.getConnection(url+dbName,userName,password);

        Statement stmt = (Statement) con.createStatement();
        ResultSet rs = stmt.executeQuery("SELECT FILE FROM doccompfiles WHERE FileName = '"+ FileName +"'");

        while(rs.next())
        {
            is = rs.getBinaryStream("FILE");
        }

        HWPFDocument doc = new HWPFDocument(is);
        WordExtractor we = new WordExtractor(doc);

        String[] paragraphs = we.getParagraphText();
        JOptionPane.showMessageDialog(null, "Number of Paragraphs" + paragraphs.length);
        con.close();
    }
    catch(Exception ex)
    {
        ex.printStackTrace();
    }
}

Stacktrace:

org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:131)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:104)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:138)
at org.apache.poi.hwpf.HWPFDocumentCore.verifyAndBuildPOIFS(HWPFDocumentCore.java:106)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:174)
at documentComparisor.Database.loadFile(Database.java:156)
at documentComparisor.Home$5.actionPerformed(Home.java:195)
at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.setPressed(Unknown Source)
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(Unknown Source)
at java.awt.Component.processMouseEvent(Unknown Source)
at javax.swing.JComponent.processMouseEvent(Unknown Source)
at java.awt.Component.processEvent(Unknown Source)
at java.awt.Container.processEvent(Unknown Source)
at java.awt.Component.dispatchEventImpl(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Window.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.EventQueue.dispatchEventImpl(Unknown Source)
at java.awt.EventQueue.access$000(Unknown Source)
at java.awt.EventQueue$3.run(Unknown Source)
at java.awt.EventQueue$3.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source)
at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source)
at java.awt.EventQueue$4.run(Unknown Source)
at java.awt.EventQueue$4.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source)
at java.awt.EventQueue.dispatchEvent(Unknown Source)
at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.run(Unknown Source)
1
  • 1
    that is the most useful exception i have ever seen Commented Oct 11, 2012 at 4:51

1 Answer 1

7

As you should know, at the moment MS Office documents exist in two different formats: one is the old format that was used by versions of MS Office before 2007 (e.g. ".doc" or ".xls"), another is XML-based format that's used by newer versions (e.g. ".docx" or ".xlsx").

There's different parts in Apache POI that handle different formats. Names of key classes for handling files in old MS Office format generally start with "H", names of the classes for working with files in XML-based format start with "X".

So in your example to handle new format you should use XWPFDocument instead of HWPFDocument:

XWPFDocument doc = new XWPFDocument(is);
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for the detailed comparison of the two. I finally understood their differences.
I'm glad that it was helpful.
Is there a way in Apache POI to convert between the HWPF to XWPF?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.