0

Sorry to post another repetitive question, but I have been dealing with this fundamental concept, and despite trying to learn from others' examples I still do not understand it.

What I am trying to do is get the contents of a PDF using PyPDF2 and write them to a CSV, and I am slowly building and testing my program step by step. I am at the point where I want my program to do two things:

1 grab the text from the pdf file

  1. output the grabbed text to a single entry in a csv file.

Now here is where my lack of fundamental programming concepts starts to show. Here's the code:

 import csv
 import os
 import PyPDF2

 os.chdir('C:/Users/User/Desktop')

 def getText(happy_file):
     pdf_file_obj = open(happy_file, 'rb')
     pdf_reader = PyPDF2.PdfFileReader(pdf_file_obj)
     pdf_reader.numPages #optional
     page_obj = pdf_reader.getPage(0)
     return page_obj.extractText()

 def writeToCSV(happy_file):
     output_file = open('myfinalfile.csv', 'w', newline ='')
     output_writer = csv.writer(output_file)
     output_writer.writerow([str(getText())])
     output_file.close()

I have two functions to accomplish this task getText and writeToCSV. My goal is to program it such that all I need to do is call writeToCSV('anyfile.pdf') and have it use both functions to extract the data and put it into the csv. happy_file is currently the argument for both functions but I know that needs to change. I am thinking that I need a third main() function that incorporates both functions in a way that the variables are contained inside main(). That might be the fundamental aspect that I am not seeing. Another hunch is that there has to be a way to make the return of getText a usable variable in writeToCSV (actually that is the whole purpose of this post). I have used the 'global' in front of a variable before to access variables in other functions but I have heard that it is a bad idea.

I get that I could just make it one function but as things get more complex (namely I want to loop through a bunch of pdfs), I would like to have my program in smaller chunks, each representing a step of the way. Maybe I am just really bad at understanding functions. Maybe seeing my actual code reformatted in the correct way will make it "click" for me.

Figuring this out would be a great step in the right direction of writing well structured programs rather than just one huge list of directions for the computer to carry out.

Here is a list of other posts I researched:

Python - Passing a function into another function

using the output of a function as the input in another function python new to coding

Python - output from functions?

Python: accessing returned values from a function, by another function

Thanks!

2
  • What exactly are you asking? Do you need a way to get what you have to work properly, or are you looking for a better/different way to solve your problem altogether? Commented May 5, 2017 at 16:15
  • @AndrewMcKernan I just need a way to make it work properly in order to help cement my understanding how how to use variables from one function in another. Commented May 7, 2017 at 5:47

1 Answer 1

2

You need to pass happy_file into the getText function within writeToCSV function.

You can then call writeToCSV as shown at the bottom of the code example.

 import csv
 import os
 import PyPDF2

 os.chdir('C:/Users/User/Desktop')

 def getText(happy_file):
     pdf_file_obj = open(happy_file, 'rb')
     pdf_reader = PyPDF2.PdfFileReader(pdf_file_obj)
     pdf_reader.numPages #optional
     page_obj = pdf_reader.getPage(0)
     return page_obj.extractText()

 def writeToCSV(happy_file):
     output_file = open('myfinalfile.csv', 'w', newline ='')
     output_writer = csv.writer(output_file)
     output_writer.writerow([str(getText(happy_file))])
     output_file.close()

writeToCSV("anyfile.pdf")

Alternatively, if for whatever reason you'd prefer a main() function you could do it like this:

 import csv
 import os
 import PyPDF2

 os.chdir('C:/Users/User/Desktop')

 def getText(happy_file):
     pdf_file_obj = open(happy_file, 'rb')
     pdf_reader = PyPDF2.PdfFileReader(pdf_file_obj)
     pdf_reader.numPages #optional
     page_obj = pdf_reader.getPage(0)
     return page_obj.extractText()

 def writeToCSV(happy_file):
     output_file = open('myfinalfile.csv', 'w', newline ='')
     output_writer = csv.writer(output_file)
     output_writer.writerow([str(getText(happy_file))])
     output_file.close()

 def main():
     writeToCSV("anyfile.pdf")

 if __name__ == "__main__":
     main()
Sign up to request clarification or add additional context in comments.

1 Comment

sorry I was away for a bit. Thanks for replying I will try this out now!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.