4

I want to convert a pdf file into excel and save it in local via python. I have converted the pdf to excel format but how should I save it local?

my code:

df = ("./Downloads/folder/myfile.pdf")
tabula.convert_into(df, "test.csv", output_format="csv", stream=True)

6 Answers 6

11

You can specify your whole output path instead of only output.csv

df = ("./Downloads/folder/myfile.pdf")
output = "./Downloads/folder/test.csv"
tabula.convert_into(df, output, output_format="csv", stream=True)

Hope this answers your question!!!

Sign up to request clarification or add additional context in comments.

Comments

2

In my case, the script below worked:

import tabula

df = tabula.read_pdf(r'C:\Users\user\Downloads\folder\3.pdf', pages='all')
tabula.convert_into(r'C:\Users\user\Downloads\folder\3.pdf', r'C:\Users\user\Downloads\folder\test.csv' , output_format="csv",pages='all', stream=True)

Comments

1

i use google collab

install the packege needed

!pip install tabula-py
!pip install pandas

Import the required Module

import tabula
import pandas as pd

Read a PDF File

data = tabula.read_pdf("example.pdf", pages='1')[0] # "all" untuk semua data, pages diisi nomor halaman

convert PDF into CSV

tabula.convert_into("example.pdf", "example.csv", output_format="csv", pages='1') #"all" untuk semua data, pages diisi no halaman
print(data)

to convert to excell file

data1 = pd.read_csv("example.csv")
data1.dtypes

now save to xlsx

data.to_excel('example.xlsx')

1 Comment

Please fix the broken code blocks at the end of your post. Thanks!
0

Documentation says that:

Output file will be saved into output_path

output_path is your second parameter, "test.csv". I guess it works fine, but you are loking it in the wrong folder. It will be located near to your script (to be strict - in current working directory) since you didn't specify full path.

Comments

0

you can also use camelot in combination with pandas

import camelot
import pandas
tables = camelot.read_pdf(path_to_pdf, flavor='stream',pages='all')
df = pandas.concat([table.df for table in tables])
df.to_csv(path_to_csv)

Comments

-1

PDF to .xlsx file:

for item in df:
   list1.append(item)
df = pd.DataFrame(list1)
df.to_excel('outputfile.xlsx', sheet_name='Sheet1', index=True)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.