0

I need bit of advice ............. I'm working on a program in Python, the program would read data from a PDF and I'm supposed to populate the same information in a excel sheet Right now I'm using PyPDF 2 to extract the data and I would be using Panda to store the data in a data frame and then that data frame would be populated in to excel sheet Is my path of action efficient and if there's a better way or a flaw in my plan please let me know about it.

1

1 Answer 1

1

I think it should be something like this.

import PyPDF2
import openpyxl

pdfFileObj = open('C:/Users/Excel/Desktop/TABLES.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
pdfReader.numPages

pageObj = pdfReader.getPage(0)
mytext = pageObj.extractText()


wb = openpyxl.load_workbook('C:/Users/Excel/Desktop/excel.xlsx')
sheet = wb.active
sheet.title = 'MyPDF'
sheet['A1'] = mytext

wb.save('C:/Users/Excel/Desktop/excel.xlsx')
print('DONE!!')

See the link below for more details.

http://automatetheboringstuff.com/chapter12/

Sign up to request clarification or add additional context in comments.

1 Comment

Does not work. Just puts the whole dataset into the first cell of the Excel file instead of rendering it as a table(s)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.