0

I'm trying to read a PDF file extracted from a zip file in memory to get the tables inside the file. Camelot seems a good way to do it, but I'm getting the following error:

AttributeError: '_io.StringIO' object has no attribute 'lower'

Is there some way to read the file and extract the tables with camelot, or should I use another library?

z = zipfile.ZipFile(self.zip_file)
for file in z.namelist():
    if file.endswith(".pdf"):
        pdf = z.read(file).decode(encoding="latin-1")
        pdf = StringIO(pdf)
        pdf = camelot.read_pdf(pdf, codec='utf-8')
0

1 Answer 1

1

camelot.read_pdf(filepath,...) Accepts a file path as the first parameter. It appears to be a bad match for your requirements. Search for another library.

In any case StringIO(pdf), will return the following:

<_io.StringIO object at 0x000002592DD33E20>

For starters, when you read a file from StringIO, do it by calling the read() function

pdf = StringIO(pdf) 
pdf.read()

That bit will indeed return the file bytes themselves. Next think about the encoding that the library will accept.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.