1

I have a PDF document stored as a Blob in a Microsoft SQL database. I am trying to convert the blob back to PDF to open in memory for analysis and also possibly save it to a local drive. I tried saving one of the documents using ".read" but it gives me an error:

ValueError: embedded null byte

Here is my code/attempt:

connect = pyodbc.connect(
Driver = driver,
Server = server,
Database = database,
User = username,
Password = password)

test_query = "SELECT TOP 1 * FROM test.PDFs"

df_test = pd.read_sql(test_query, connect)

df_test_pdf = df_test['RawDocument'][0]

with open(df_test_pdf, "rb") as f:
   b = f.read

print(df_test_pdf)
2
  • open is meant to open files. df_test_pdf though isn't a file. At best, it's a buffer in memory. Save it to disk first to ensure you can read it. Commented Mar 26, 2018 at 12:32
  • Okay, the PDF blob is in the 'RawDocument' column, how do i avoid creating a buffer in memory and saving the fie? When I try writing it I get this error: df_test_pdf = df_test_pdf.write(df_test_pdf) "AttributeError: 'bytes' object has no attribute 'write'" Commented Mar 26, 2018 at 12:51

1 Answer 1

2

I solved it with this

with open("Output.pdf", "wb") as output_file:
    cursor.execute("SELECT TOP 1 RawDocument FROM test.PDFs")
    ablob = cursor.fetchone()
    output_file.write(ablob[0])

Got the answer from a similar question here:

Writing blob from SQLite to file using Python

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.