Your first mistake is not having a variable assigned to your function call where it return the processed text.
x=getPDFContent(path_to_sample)
If that still doesn't fix the problem:
Try Using The module PDF Miner.(PDF Miner.Six for Python 3). PyPDF2 can sometimes be problematic depending on which version of Python you use. I faced issues with PyPDF2 with certain PDF files which gave me a similar oytput to yours. However PDFMiner has worked with the following code consistently with Python 3.xx.
Download PDFMiner with the command: pip install pdfminer.six for Python 2+3 compatibility and use the following code below and you should be good to go.
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
def getPDFContent(path,pages=None):
pdf_str=""
if not pages:
pagenums = set()
else:
pagenums = set(pages)
output = io.StringIO()
manager = PDFResourceManager()
converter = TextConverter(manager, output, laparams=LAParams())
interpreter = PDFPageInterpreter(manager, converter)
infile = open(path, 'rb')
for page in PDFPage.get_pages(infile, pagenums):
interpreter.process_page(page)
infile.close()
converter.close()
text = output.getvalue()
pdf_str=text
output.close()
return(pdf_str)
x=getPDFContent(path_to_sample)