Problem statement:
I'm writing a program in Jupyter Notebook that dynamically writes another script (script.py). After writing script.py, the function that wrote the file runs it via an import statement, then calls a function from script.py.
I need to use pandas in script.py, and I import it at the top of script.py. I get NameError: name 'pd' is not defined just after executing import pandas as pd at the top of script.py. I initially tried leaving out the import statement since it was already executed in the calling program, but I got the same error. I tried putting the import statement within the function in script.py, but I got the same error.
Update2, resolved:
The code now works. I'm pretty sure the only thing I did was walk away and come back and enter %debug, and restart the kernel and run all cells. It found no traceback to debug. I guess you could say it was magic, but maybe it was restarting the kernel. Magic makes more sense to me, haha.
Update1: The original example code did not actually reproduce the error. If I would have test run it, I would have better isolated the problem in the real code. My bad. I'm still not able to fix the problem, but it seems like there's something about the loop that constructs the write statements that is messing up. Because running similar code once without a loop works.
Here's my real code:
import os
import pandas as pd
def read_files_in_folder(fp_list, path=None, arg_list=None):
'''Reads a folder of csv tables into a dictionary of dataframes.
Does this dynamically by writing a script to a file, importing the script,
and running a function from the script.
Parameters:
fp_list is [str]: list of filenames or filepaths of csv files.
path is str: (optional) filepath str filenames. os.curdir if None.
arg_list is [str]: (optional) list of pd.read_csv() arguments to pass.
Returns:
df_dict is {pd.DataFrame}: dict of dataframes created from csv files.'''
df_dict = {}
if path is None:
path = os.curdir
if arg_list is None:
for fp in fp_list:
fp_var_name = fp.split('/')[-1].split('.')[0]
df_dict[fp_var_name] = pd.read_csv(path + fp)
else:
args = ''
for arg in arg_list:
args += ', ' + arg
with open('script.py', 'w') as file:
file.write("""
import pandas as pd
def csvs_to_df_dict():
\tdf_dict = {}
""")
for fp in fp_list:
fp_var_name = fp.split('/')[-1].split('.')[0]
statement = "\tdf_dict['" + fp_var_name + "'] = pd.read_csv('" + path + fp + "'" + args + ")\n"
file.write(statement)
file.write('\treturn df_dict')
import script
df_dict = script.csvs_to_df_dict()
return df_dict
I then execute:
csv_path = os.curdir + '/csv_tables/'
filename_list = os.listdir(path=csv_path)
df_dict = read_files_in_folder(fp_list=filename_list, path=csv_path,
arg_list=['index_col=0','skip_blank_lines=False'])
df_dict['abscorrup_idea.csv']
This writes script.py:
import pandas as pd
def csvs_to_df_dict():
df_dict = {}
df_dict['abscorrup_idea'] = pd.read_csv('./csv_tables/abscorrup_idea.csv', index_col=0, skip_blank_lines=False)
# ... ... ...
df_dict['sorigeq_idea'] = pd.read_csv('./csv_tables/sorigeq_idea.csv', index_col=0, skip_blank_lines=False)
return df_dict
But, it returns NameError: name 'pd' is not defined once it enters script.py from df_dict = script.csvs_to_df_dict(), after script.py's import pandas as pd. See below for full error output.
It works if you don't pass arg_list and thus don't create a script.py file in the first place. So, it works for my immediate use, but I want to understand why it won't work the other way.
I initially tried writing script.py as a series of statements and not a function. I assumed it would just run as if I had inserted that block of code into the code that calls it, but I was unable to call on df_dict from one script to the other. Different namespace? So, I'm trying a function.
Here's the full error ouput:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-26-13999e7ca3af> in <module>
----> 1 df_dict = read_files_in_folder(fp_list=filename_list, path=csv_path,
2 arg_list=['index_col=0','skip_blank_lines=False'])
<ipython-input-25-4f1e04e89145> in read_files_in_folder(fp_list, path, arg_list)
35 file.write('\treturn df_dict')
36 import script
---> 37 df_dict = script.csvs_to_df_dict()
38
39 return df_dict
~\OneDrive\Education\WGU\C749_intro_to_data_science\Module_3_Investigate_A_Dataset\Project\script.py in csvs_to_df_dict()
1
2 import pandas as pd
----> 3
4 def csvs_to_df_dict():
5 df_dict = {}
NameError: name 'pd' is not defined
Original example before update, cleaned up and running properly:
For example:
# script1.py #
import pandas as pd
# The following is actually part of a function
# that is called later in the same script1,
# but I'm keeping it simple for the example.
df_dict = {}
with open('script2.py', 'w') as file:
file.write("""
# script2.py #
import pandas as pd
def run_it():
\tdf_dict = {}
""")
path = './csv_tables/'
fn = 'abscorrup_idea.csv'
file.write("\tdf_dict['abscorrup_idea'] = pd.read_csv('" + path + fn + "', index_col=0, skip_blank_lines=False)\n")
file.write('\treturn df_dict')
import script2
df_dict = script2.run_it()
df_dict
This writes the following file, runs it, and calls the function:
# script2.py #
import pandas as pd
def run_it():
df_dict = {}
df_dict['abscorrup_idea'] = pd.read_csv('./csv_tables/abscorrup_idea.csv', index_col=0, skip_blank_lines=False)
return df_dict