2

I need to access data from pdf form fields. I tried the package PyPDF2 with this code:

import PyPDF2

reader = PyPDF2.PdfReader('formular.pdf')
print(reader.pages[0].extract_text())

But this gives me only the text of the normal pdf data, not the form fields.

Does anyone know how to read text from the form fields?

2 Answers 2

4

You can use the getFormTextFields() method to return a dictionary of form fields (see https://pythonhosted.org/PyPDF2/PdfFileReader.html). Use the dictionary keys (the field name) to access the values (the field values).The following example might help:

from PyPDF2 import PdfFileReader

infile = "myInputPdf.pdf"
pdf_reader = PdfFileReader(open(infile, "rb"))

dictionary = pdf_reader.getFormTextFields() # returns a python dictionary
my_field_value = str(dictionary['my_field_name']) # use field name (dictionary key) to access field value (dictionary value)
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much. That is exactly what I was looking for.
0

There are library in python through which you can access pdf data. As pdf is not a raw data like csv, txt,tsv etc. So python can't directly read data inside pdf files.

There is a python library name as slate Slate documentation. Read this documentation. I hope you will get answer to your question.

1 Comment

I already get the pdf text, but not the text of the form fields. For example if I have a form like this: (brackets stands for Formfield) PDF FORM Name: (testname) First Name: (abcde) Then I'm getting only the following information: PDF FORM Name: First Name: But what I want is the information "testname" and "abcde"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.