I need to read a large (50000 row and 20 columns) excel file using Apache POI library. There is another question that asks exactly the same thing. My attempted approach is as follows:
public static ArrayList<Double> readColumn(String excelFile,String sheetName, int columnNumber)
{
ArrayList<Double> excelData = new ArrayList<>();
XSSFWorkbook workbook = null;
try
{
workbook = new XSSFWorkbook(excelFile);
} catch (IOException e)
{
e.printStackTrace();
}
Sheet sheet = workbook.getSheet(sheetName);
for (int i = 0; i <= sheet.getLastRowNum(); i++)
{
Row row = sheet.getRow(i);
if (row != null) {
Cell cell = row.getCell(columnNumber);
if (cell != null)
{
// Skip cellls that are not numericals
if (cell.getCellTypeEnum() == CellType.NUMERIC)
{
excelData.add(cell.getNumericCellValue());
System.out.println(cell.getNumericCellValue());
}
}
}
}
return excelData;
}
Unfortunately, while this method seems to work when accessing a low index column number (e.g. columnNumber =1), I get an OutOfMemoryError exception for a large columnNumber. The file itself is not too large to make my computer run out memory. I can achieve the same outcome in Python with very little memory requirements.Is there a better way to solve this? Or, is there any Java library that would allow me to do that?