I'm trying to extract data from multiple multiple tables in multiple pdf and save it in csv format. I did my research and found python-camelot is good tool to extract. I tried and it works perfectly fine on a single pdf. However, I have over 50 PDFs in the same format so i decided to iterate over all files using For loop but it did not work and i get an error files are not found in the directory. can you please help. Here is the code:
import tkinter
import camelot
import os
directory = 'C:\\Users\\Alr\\Desktop\\test\\'
files = [ filename for filename in os.listdir(directory)]
for i in range (len(files)):
tables = camelot.read_pdf(files[i], pages='5,6,7')
tables.export(files[i], f='csv', compress=True) # json, excel, html, sqlite
tables.to_csv(files[i]+'.csv')
filesnever gets set to any value. Did you forget to read the folder?os.listdirreturns the names of the files and that means that the path is not included. Just prependdirectoryto the file name inread_pdfand you're set.