0

Problem statement:

I'm writing a program in Jupyter Notebook that dynamically writes another script (script.py). After writing script.py, the function that wrote the file runs it via an import statement, then calls a function from script.py.

I need to use pandas in script.py, and I import it at the top of script.py. I get NameError: name 'pd' is not defined just after executing import pandas as pd at the top of script.py. I initially tried leaving out the import statement since it was already executed in the calling program, but I got the same error. I tried putting the import statement within the function in script.py, but I got the same error.

Update2, resolved: The code now works. I'm pretty sure the only thing I did was walk away and come back and enter %debug, and restart the kernel and run all cells. It found no traceback to debug. I guess you could say it was magic, but maybe it was restarting the kernel. Magic makes more sense to me, haha.

Update1: The original example code did not actually reproduce the error. If I would have test run it, I would have better isolated the problem in the real code. My bad. I'm still not able to fix the problem, but it seems like there's something about the loop that constructs the write statements that is messing up. Because running similar code once without a loop works.

Here's my real code:

import os
import pandas as pd

def read_files_in_folder(fp_list, path=None, arg_list=None):
    '''Reads a folder of csv tables into a dictionary of dataframes.
    Does this dynamically by writing a script to a file, importing the script,
    and running a function from the script.
    Parameters:
        fp_list is [str]: list of filenames or filepaths of csv files.
        path is str: (optional) filepath str filenames. os.curdir if None.
        arg_list is [str]: (optional) list of pd.read_csv() arguments to pass.
    Returns:
        df_dict is {pd.DataFrame}: dict of dataframes created from csv files.'''
    
    df_dict = {}
    
    if path is None:
        path = os.curdir
        
    if arg_list is None:
        for fp in fp_list:
            fp_var_name = fp.split('/')[-1].split('.')[0]
            df_dict[fp_var_name] = pd.read_csv(path + fp)
    else:
        args = ''
        for arg in arg_list:
            args += ', ' + arg
        with open('script.py', 'w') as file:
            file.write("""
import pandas as pd

def csvs_to_df_dict():
\tdf_dict = {}
""")
            for fp in fp_list:
                fp_var_name = fp.split('/')[-1].split('.')[0]
                statement = "\tdf_dict['" + fp_var_name + "'] = pd.read_csv('" + path + fp + "'" + args + ")\n"
                file.write(statement)
            file.write('\treturn df_dict')
        import script
        df_dict = script.csvs_to_df_dict()
    
    return df_dict

I then execute:

csv_path = os.curdir + '/csv_tables/'
filename_list = os.listdir(path=csv_path)
df_dict = read_files_in_folder(fp_list=filename_list, path=csv_path,
                               arg_list=['index_col=0','skip_blank_lines=False'])
df_dict['abscorrup_idea.csv']

This writes script.py:


import pandas as pd

def csvs_to_df_dict():
    df_dict = {}
    df_dict['abscorrup_idea'] = pd.read_csv('./csv_tables/abscorrup_idea.csv', index_col=0, skip_blank_lines=False)
# ... ... ...
    df_dict['sorigeq_idea'] = pd.read_csv('./csv_tables/sorigeq_idea.csv', index_col=0, skip_blank_lines=False)
    return df_dict

But, it returns NameError: name 'pd' is not defined once it enters script.py from df_dict = script.csvs_to_df_dict(), after script.py's import pandas as pd. See below for full error output.

It works if you don't pass arg_list and thus don't create a script.py file in the first place. So, it works for my immediate use, but I want to understand why it won't work the other way.

I initially tried writing script.py as a series of statements and not a function. I assumed it would just run as if I had inserted that block of code into the code that calls it, but I was unable to call on df_dict from one script to the other. Different namespace? So, I'm trying a function.

Here's the full error ouput:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-26-13999e7ca3af> in <module>
----> 1 df_dict = read_files_in_folder(fp_list=filename_list, path=csv_path,
      2                                arg_list=['index_col=0','skip_blank_lines=False'])

<ipython-input-25-4f1e04e89145> in read_files_in_folder(fp_list, path, arg_list)
     35             file.write('\treturn df_dict')
     36         import script
---> 37         df_dict = script.csvs_to_df_dict()
     38 
     39     return df_dict

~\OneDrive\Education\WGU\C749_intro_to_data_science\Module_3_Investigate_A_Dataset\Project\script.py in csvs_to_df_dict()
      1 
      2 import pandas as pd
----> 3 
      4 def csvs_to_df_dict():
      5     df_dict = {}

NameError: name 'pd' is not defined

Original example before update, cleaned up and running properly:

For example:

# script1.py #
import pandas as pd

# The following is actually part of a function
# that is called later in the same script1,
# but I'm keeping it simple for the example.

df_dict = {}

with open('script2.py', 'w') as file:
    file.write("""
# script2.py #
import pandas as pd
def run_it():
\tdf_dict = {}
""")
    path = './csv_tables/'
    fn = 'abscorrup_idea.csv'
    file.write("\tdf_dict['abscorrup_idea'] = pd.read_csv('" + path + fn + "', index_col=0, skip_blank_lines=False)\n")
    file.write('\treturn df_dict')

import script2
df_dict = script2.run_it()
df_dict

This writes the following file, runs it, and calls the function:


# script2.py #
import pandas as pd
def run_it():
    df_dict = {}
    df_dict['abscorrup_idea'] = pd.read_csv('./csv_tables/abscorrup_idea.csv', index_col=0, skip_blank_lines=False)
    return df_dict

2 Answers 2

1

I have tried to reproduce your error but failed. When I just copy paste your code I get a SyntaxError because something is wrong about your escaping. But this

with open('script2.py', 'w') as file:
    file.write("""
# script2.py #
import pandas as pd
def run_it():
    df_dict = {}
    df_dict["test"] = pd.DataFrame(data={"test":[1,2,3]})
    return df_dict
""")

import script2
df_dict = script2.run_it()
df_dict["test"]

works perfectly fine on my machine. Notice that I had to take a different example dataframe since I don't have your csv file.

Sign up to request clarification or add additional context in comments.

6 Comments

Haha, I should have tested my example code, sorry. I'll fix that. I like the way you created that write string. It worked once I implemented that in the test example. I will try it in the real script and report back. Thanks!
I updated the post to include the actual code I'm working on. I'm unable to construct the write string with triple quotes since I need to dynamically compose it in a loop. There's something about that loop that messes it up. Care to take a second look? Thanks again!
Nevermind, the code now works. I'm pretty sure the only thing I did was walk away and come back and enter %debug, and restart the kernel and run all cells. It found no traceback to debug. I guess you could say it was magic, but maybe it was restarting the kernel, which makes zero sense to me. Magic makes more sense to me.
Oh that's due to how imports works. You can't really import things twice. You're stuck with how it was when you imported first. Jupyter does try to provide magic to work around that though. You can try %load_ext \n autoreload %autoreload 2 if you have that problem more oftenly.
@KalebCoberly So everything's fine now? Or should the fact that you haven't accepted an answer tell people that there are issues left?
|
0

As seen in the update to the post, the following code works. Restarting the kernel seems to have done the trick. That or magic.

import os
import pandas as pd

def read_files_in_folder(fp_list, path=None, arg_list=None):
    '''Reads a folder of csv tables into a dictionary of dataframes.
    Does this dynamically by writing a script to a file, importing the script,
    and running a function from the script.
    Parameters:
        fp_list is [str]: list of filenames or filepaths of csv files.
        path is str: (optional) filepath str filenames. os.curdir if None.
        arg_list is [str]: (optional) list of pd.read_csv() arguments to pass.
    Returns:
        df_dict is {pd.DataFrame}: dict of dataframes created from csv files.'''
    
    df_dict = {}
    
    if path is None:
        path = os.curdir
        
    if arg_list is None:
        for fp in fp_list:
            fp_var_name = fp.split('/')[-1].split('.')[0]
            df_dict[fp_var_name] = pd.read_csv(path + fp)
    else:
        args = ''
        for arg in arg_list:
            args += ', ' + arg
        with open('script.py', 'w') as file:
            file.write("""
import pandas as pd

def csvs_to_df_dict():
\tdf_dict = {}
""")
            for fp in fp_list:
                fp_var_name = fp.split('/')[-1].split('.')[0]
                statement = "\tdf_dict['" + fp_var_name + "'] = pd.read_csv('" + path + fp + "'" + args + ")\n"
                file.write(statement)
            file.write('\treturn df_dict')
        import script
        df_dict = script.csvs_to_df_dict()
    
    return df_dict

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.