I'm extracting data from PDF to excel. In that PDF contains table also. I used Itext- pdf to covert PDF to text & with the help of apache poi covert text to excel. but I'm not able to retrieve the data to store in the database. I tried PDF-BOX, ASPOSE also Same result I'm getting. If any one knows, Please help me to solve this issue.
Here is my code
// pdf to text using itext
PdfReader reader = new PdfReader(
"C:\\Users\\mohmeds\\Desktop\\BOI_SCFS banking.pdf_page_1.pdf");
PdfReaderContentParser parser = new PdfReaderContentParser(
reader);
// PrintWriter out = new PrintWriter(new FileOutputStream(txt));
TextExtractionStrategy strategy;
String line = null;
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
strategy = parser.processContent(i,
new SimpleTextExtractionStrategy());
line = strategy.getResultantText();
}
reader.close();
// using apache poi text to excel converter
org.apache.poi.ss.usermodel.Workbook wb = new HSSFWorkbook();
CreationHelper helper = wb.getCreationHelper();
Sheet sheet = wb.createSheet("new sheet");
System.out.println("link------->" + line);
List<String> lines = IOUtils.readLines(new StringReader(line));
for (int i = 0; i < lines.size(); i++) {
String str[] = lines.get(i).split(",");
Row row = sheet.createRow((short) i);
for (int j = 0; j < str.length; j++) {
row.createCell(j).setCellValue(
helper.createRichTextString(str[j]));
}
}
FileOutputStream fileOut = new FileOutputStream(
"C:\\Users\\mohmeds\\Desktop\\someName1.xls");
wb.write(fileOut);
fileOut.close();