How to access data from pdf forms with python?

Question

I need to access data from pdf form fields. I tried the package PyPDF2 with this code:

import PyPDF2

reader = PyPDF2.PdfReader('formular.pdf')
print(reader.pages[0].extract_text())

But this gives me only the text of the normal pdf data, not the form fields.

Does anyone know how to read text from the form fields?

tromar · Accepted Answer · 2018-01-25 22:05:47Z

4

You can use the getFormTextFields() method to return a dictionary of form fields (see https://pythonhosted.org/PyPDF2/PdfFileReader.html). Use the dictionary keys (the field name) to access the values (the field values).The following example might help:

from PyPDF2 import PdfFileReader

infile = "myInputPdf.pdf"
pdf_reader = PdfFileReader(open(infile, "rb"))

dictionary = pdf_reader.getFormTextFields() # returns a python dictionary
my_field_value = str(dictionary['my_field_name']) # use field name (dictionary key) to access field value (dictionary value)

answered Jan 25, 2018 at 22:05

tromar

2213 silver badges6 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Antonio Kallai Over a year ago

Thank you very much. That is exactly what I was looking for.

Anurag Misra · Accepted Answer · 2017-09-08 07:52:54Z

0

There are library in python through which you can access pdf data. As pdf is not a raw data like csv, txt,tsv etc. So python can't directly read data inside pdf files.

There is a python library name as slate Slate documentation. Read this documentation. I hope you will get answer to your question.

answered Sep 8, 2017 at 7:52

Anurag Misra

1,57420 silver badges24 bronze badges

1 Comment

Antonio Kallai Over a year ago

I already get the pdf text, but not the text of the form fields. For example if I have a form like this: (brackets stands for Formfield) PDF FORM Name: (testname) First Name: (abcde) Then I'm getting only the following information: PDF FORM Name: First Name: But what I want is the information "testname" and "abcde"

Collectives™ on Stack Overflow

How to access data from pdf forms with python?

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related