1

As the title states, I have a large Excel file (>200 sheets) that I need to add data to. I do not want to create new cells, I only want to modify existing ones.

I tried using Apache Poi but my application runs out of memory even with Xms and Xmx set to 8g. The only option for low-memory writing is seemingly with SXSSF. The problem is that it only works for creating new cells and does not allow modifying existing ones. I also tried using the event API in order to process the sheet's XML, but it only seems to work for read operations. I've been trying to use an XMLEventWriter but I can't find a way to access the sheets' XML data which works for writing. Is there a way to access an Excel file's XML data other than with XSSFReader?

5
  • 3
    If the problem is only parsing all 200+ sheets to XSSFSheets together, then you could overwriting XSSFWorkbook.parseSheet as shown in HybridStreaming.java. Using this you could have only one or some of the sheets parsed to XSSFSheet. But of course sometimes even this is not possible because all of the sheets depend on each other. Commented Jun 24, 2019 at 15:29
  • 2
    And if even only single sheets are too big for XSSFSheet, then using the high level apache poi classes is not possible. Then only using pure XML methods will be possible. I have shown how to do that for reading and writing using StAX. Commented Jun 24, 2019 at 15:29
  • The problem mainly comes from writing. With POI I'm able to open the workbook and modify the cells, but the application crashes on the final call to workbook.write(). I wrote some code with StAX but I can't figure out how to access the sheets for writing. The sheets are easily accessible for reading by using XSSFReader to generate an InputStream which is then used to create the XML reader. However there are no classes to generate an Outputstream for writing. Writing to a regular .xml file would be simple but I can't figure out to pass my Excel document to an XML writer. Commented Jun 25, 2019 at 6:52
  • 1
    I have linked my examples using StAX. They contain code for reading as well as writing. But of course none of those is general usable for reading and writing all kinds of Excel files. Each Excel workbook needs it's own code dependent on it's structure and on what content shall be changed. Commented Jun 25, 2019 at 8:57
  • The examples that feature writing only do so at the end of the document from what I can tell. I am trying to modify existing elements/cells. Is this possible with StAX? Commented Jun 25, 2019 at 10:26

1 Answer 1

2

As told in comments above there is no one fits all solution using pure XML reading and writing the Office Open XML spreadsheets. Each Excel workbook needs it's own code dependent on it's structure and on what content shall be changed.

This is because apache poi's high level classes provides a meta level to avoid this. But this needs memory to work. And for very big workbooks it needs much memory. To avoid memory consumption through manipulating the XML directly this meta level is not usable. So one must know the XML structure of a worksheet and the meaning of the XML elements used.

So if we have a Excel workbook having a first sheet having strings in column A and numbers in column B, then we could changing every fifth row using StAX for manipulating the XML directly using following code:

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackagePart;

import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;

import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst;

import javax.xml.stream.XMLEventFactory;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLEventWriter;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.events.Characters;
import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.XMLEvent;

import javax.xml.namespace.QName;

import java.io.File;
import java.io.InputStream;
import java.io.OutputStream;

import java.util.regex.Pattern;

class StaxReadAndChangeTest {

 public static void main(String[] args) throws Exception {
  File file = new File("ReadAndWriteTest.xlsx");
  OPCPackage opcpackage = OPCPackage.open(file);

  //since there are strings in the sheet data, we need the SharedStringsTable
  PackagePart sharedstringstablepart = opcpackage.getPartsByName(Pattern.compile("/xl/sharedStrings.xml")).get(0);
  SharedStringsTable sharedstringstable = new SharedStringsTable();
  sharedstringstable.readFrom(sharedstringstablepart.getInputStream());

  //get first worksheet
  PackagePart sheetpart = opcpackage.getPartsByName(Pattern.compile("/xl/worksheets/sheet1.xml")).get(0);

  //get XML reader and writer            
  XMLEventReader reader = XMLInputFactory.newInstance().createXMLEventReader(sheetpart.getInputStream());
  XMLEventWriter writer = XMLOutputFactory.newInstance().createXMLEventWriter(sheetpart.getOutputStream());

  XMLEventFactory eventFactory = XMLEventFactory.newInstance();

  int rowsCount = 0;
  int colsCount = 0;
  boolean cellAfound = false;
  boolean cellBfound = false;

  while(reader.hasNext()){ //loop over all XML in sheet1.xml
   XMLEvent event = (XMLEvent)reader.next();
   if(event.isStartElement()) {
    StartElement startElement = (StartElement)event;
    QName startElementName = startElement.getName();
    if(startElementName.getLocalPart().equalsIgnoreCase("row")) { //start element of row
     rowsCount++;
     colsCount = 0;
    } else if (startElementName.getLocalPart().equalsIgnoreCase("c")) { //start element of cell
     colsCount++;
     cellAfound = false;
     cellBfound = false;
     if (rowsCount % 5 == 0) { // every 5th row
      if (colsCount == 1) { // cell A
       cellAfound = true;
      } else if (colsCount == 2) { // cell B
       cellBfound = true;
      } 
     }
    } else if (startElementName.getLocalPart().equalsIgnoreCase("v")) { //start element of value
     if (cellAfound) {
      // create new rich text content for cell A
      CTRst ctstr = CTRst.Factory.newInstance();
      ctstr.setT("changed String Value A" + (rowsCount));
      //int sRef = sharedstringstable.addEntry(ctstr);
      int sRef = sharedstringstable.addSharedStringItem(new XSSFRichTextString(ctstr));
      // set the new characters for A's value in the XML
      if (reader.hasNext()) {
       writer.add(event); // write the old event
       event = (XMLEvent)reader.next(); // get next event - should be characters
       if (event.isCharacters()) {
        Characters value = eventFactory.createCharacters(Integer.toString(sRef));
        event = value;
       } 
      }        
     } else if (cellBfound) {
      // set the new characters for B's value in the XML
      if (reader.hasNext()) {
       writer.add(event); // write the old event
       event = (XMLEvent)reader.next(); // get next event - should be characters
       if(event.isCharacters()) { 
        double oldValue = Double.valueOf(((Characters)event).getData()); // old double value
        Characters value = eventFactory.createCharacters(Double.toString(oldValue * rowsCount));
        event = value;         
       }         
      }
     }
    }
   }
   writer.add(event); //by default write each read event
  }
  writer.flush();

  //write the SharedStringsTable
  OutputStream out = sharedstringstablepart.getOutputStream();
  sharedstringstable.writeTo(out);
  out.close();
  opcpackage.close();

 }
}

This will be much less memory consuming than apache poi's XSSF classes. But, as said, it only works exactly for this kind of Excel workbook having a first sheet having strings in column A and numbers in column B.

Sign up to request clarification or add additional context in comments.

2 Comments

I managed to make something that works using you example as inspiration, thanks for the help. The only thing I have left to do is figure out how to force the recalculation of the formulas in the sheet. Is there a flag in workbook.xml which does that?
Found the answer to my previous question (forceFullCalc) here: c-rex.net/projects/samples/ooxml/e1/Part4/… . Hope this can help someone else

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.