I need bit of advice ............. I'm working on a program in Python, the program would read data from a PDF and I'm supposed to populate the same information in a excel sheet Right now I'm using PyPDF 2 to extract the data and I would be using Panda to store the data in a data frame and then that data frame would be populated in to excel sheet Is my path of action efficient and if there's a better way or a flaw in my plan please let me know about it.
-
Welcome to Stack Overflow! Please edit your question to show the code you have so far. You should include at least an outline (but preferably a Minimal, Complete, and Verifiable example) of the code that you are having problems with, then we can try to help. You should also read How to Ask.import random– import random2018-03-07 23:05:34 +00:00Commented Mar 7, 2018 at 23:05
Add a comment
|
1 Answer
I think it should be something like this.
import PyPDF2
import openpyxl
pdfFileObj = open('C:/Users/Excel/Desktop/TABLES.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
pdfReader.numPages
pageObj = pdfReader.getPage(0)
mytext = pageObj.extractText()
wb = openpyxl.load_workbook('C:/Users/Excel/Desktop/excel.xlsx')
sheet = wb.active
sheet.title = 'MyPDF'
sheet['A1'] = mytext
wb.save('C:/Users/Excel/Desktop/excel.xlsx')
print('DONE!!')
See the link below for more details.
1 Comment
Korzak
Does not work. Just puts the whole dataset into the first cell of the Excel file instead of rendering it as a table(s)