how to convert pdf file to excel file using python

Question

I want to convert a pdf file into excel and save it in local via python. I have converted the pdf to excel format but how should I save it local?

my code:

df = ("./Downloads/folder/myfile.pdf")
tabula.convert_into(df, "test.csv", output_format="csv", stream=True)

skaul05 · Accepted Answer · 2019-11-04 09:41:23Z

11

You can specify your whole output path instead of only output.csv

df = ("./Downloads/folder/myfile.pdf")
output = "./Downloads/folder/test.csv"
tabula.convert_into(df, output, output_format="csv", stream=True)

Hope this answers your question!!!

answered Nov 4, 2019 at 9:41

skaul05

2,3843 gold badges21 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

David Buck · Accepted Answer · 2020-08-08 13:48:40Z

2

In my case, the script below worked:

import tabula

df = tabula.read_pdf(r'C:\Users\user\Downloads\folder\3.pdf', pages='all')
tabula.convert_into(r'C:\Users\user\Downloads\folder\3.pdf', r'C:\Users\user\Downloads\folder\test.csv' , output_format="csv",pages='all', stream=True)

edited Aug 8, 2020 at 13:48

David Buck

3,88840 gold badges54 silver badges74 bronze badges

answered Aug 8, 2020 at 12:48

Darshil Lakhani

356 bronze badges

Comments

Khoirul Anam · Accepted Answer · 2023-03-07 03:31:24Z

1

i use google collab

install the packege needed

!pip install tabula-py
!pip install pandas

Import the required Module

import tabula
import pandas as pd

Read a PDF File

data = tabula.read_pdf("example.pdf", pages='1')[0] # "all" untuk semua data, pages diisi nomor halaman

convert PDF into CSV

tabula.convert_into("example.pdf", "example.csv", output_format="csv", pages='1') #"all" untuk semua data, pages diisi no halaman
print(data)

to convert to excell file

data1 = pd.read_csv("example.csv")
data1.dtypes

now save to xlsx

data.to_excel('example.xlsx')

edited Mar 7, 2023 at 3:31

answered Feb 23, 2023 at 2:02

Khoirul Anam

113 bronze badges

1 Comment

Blue Robin Over a year ago

Please fix the broken code blocks at the end of your post. Thanks!

QtRoS · Accepted Answer · 2019-11-04 09:43:25Z

0

Documentation says that:

Output file will be saved into output_path

output_path is your second parameter, "test.csv". I guess it works fine, but you are loking it in the wrong folder. It will be located near to your script (to be strict - in current working directory) since you didn't specify full path.

answered Nov 4, 2019 at 9:43

QtRoS

1,1781 gold badge16 silver badges24 bronze badges

Comments

smoquet · Accepted Answer · 2022-12-07 11:31:50Z

0

you can also use camelot in combination with pandas

import camelot
import pandas
tables = camelot.read_pdf(path_to_pdf, flavor='stream',pages='all')
df = pandas.concat([table.df for table in tables])
df.to_csv(path_to_csv)

answered Dec 7, 2022 at 11:31

smoquet

3814 silver badges11 bronze badges

Comments

Hith · Accepted Answer · 2021-04-08 10:03:15Z

-1

PDF to .xlsx file:

for item in df:
   list1.append(item)
df = pd.DataFrame(list1)
df.to_excel('outputfile.xlsx', sheet_name='Sheet1', index=True)

answered Apr 8, 2021 at 10:03

Hith

475 bronze badges

Collectives™ on Stack Overflow

how to convert pdf file to excel file using python

6 Answers 6

Comments

Comments

install the packege needed

Import the required Module

Read a PDF File

convert PDF into CSV

to convert to excell file

now save to xlsx

1 Comment

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

Comments

install the packege needed

Import the required Module

Read a PDF File

convert PDF into CSV

to convert to excell file

now save to xlsx

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related