Pandas - NameError: name 'df2' is not defined

Question

For a current project, I am running a number of iterations of a Pandas DataFrame and plan to print the variable df2.

When calling the line print(df2), I am however getting the error NameError: name 'df2' is not defined. I have already been checking for solutions but have not found anything yet. Is there any smart tweak to make this run?

The corresponding code section looks like this:

# Open the file to write to
with open('sp500-1.csv', 'w', newline='') as file:
    writer = csv.writer(file)

    # Write headers
    writer.writerow(["Section", "TFI"])

    # Loop over the JSON objects
    for i in ['txt_pro','txt_con','txt_adviceMgmt','txt_main']:

        # Loop over the common words inside the JSON object
        common_words = get_top_n_bigram_Group2(df[i], 500)
        for word in common_words:

            # Print and write row.
            print(df2)
            writer.writerow([df2])

And the code that defines df2 is as follows:

def get_top_n_bigram_Group2(corpus, n=None):
    # settings that you use for count vectorizer will go here
    tfidf_vectorizer=TfidfVectorizer(ngram_range=(2, 2), stop_words='english', use_idf=True).fit(corpus)

    # just send in all your docs here
    tfidf_vectorizer_vectors=tfidf_vectorizer.fit_transform(corpus)

    # get the first vector out (for the first document)
    first_vector_tfidfvectorizer=tfidf_vectorizer_vectors[0]

    # place tf-idf values in a pandas data frame
    df1 = pd.DataFrame(first_vector_tfidfvectorizer.T.todense(), index=tfidf_vectorizer.get_feature_names(), columns=["tfidf"])
    df2 = df1.sort_values(by=["tfidf"],ascending=False)

    return df2

You are not actually defining df2 in the global scope (i.e., outside of the function get_top_n_bigram_Group2). You name the output of get_top_n_bigram_Group2 common_words, so you should use print(common_words). The name df2 only exists within the function. — jkr
– jkr, Commented Jul 14, 2020 at 16:06
Thanks for the great input. If I understand things correctly, common_words will in this case yield the df2 variable that has been defined in def get_top_n_bigram_Group2 — Malte Susen
– Malte Susen, Commented Jul 14, 2020 at 16:13
also note that dataframes have a .to_csv method that will probably make your life much easier — Paul H
– Paul H, Commented Jul 14, 2020 at 16:20
Thanks, let me see to integrate that. For now the code did only print the right output in the terminal but has not written the lines in the .csv file — Malte Susen
– Malte Susen, Commented Jul 14, 2020 at 16:33

Victor Oliveira · Accepted Answer · 2020-07-14 16:10:52Z

1

This happens because df2 is defined inside your function and it obeys python scope of variables, so it exists only inside your function definition.

Since you return it and pass another name to it:

common_words = get_top_n_bigram_Group2(df[i], 500)

So you return df2 value and pass it as common_words

And then you are iterating over it:

for word in common_words:

Therefore you should use word, instead of df2 in your printing and writerow functions.

answered Jul 14, 2020 at 16:10

Victor Oliveira

2161 silver badge6 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

NerdioN · Accepted Answer · 2020-07-14 16:47:28Z

1

Hey this is just a simple question. See a function returns a variable but it returns to the variable where it is called.

# Open the file to write to
  with open('sp500-1.csv', 'w', newline='') as file:
writer = csv.writer(file)

# Write headers
writer.writerow(["Section", "TFI"])

# Loop over the JSON objects
for i in ['txt_pro','txt_con','txt_adviceMgmt','txt_main']:

    # Loop over the common words inside the JSON object
    common_words = get_top_n_bigram_Group2(df[i], 500)
    for word in common_words:

        # Print and write row.
        print(common_words)
        writer.writerow([word])

edited Jul 14, 2020 at 16:47

answered Jul 14, 2020 at 16:08

NerdioN

798 bronze badges

3 Comments

NerdioN Over a year ago

where common_words will give you then whole object & word will give you a record

Malte Susen Over a year ago

Thanks, that makes the code run through. It however does not write any of the variables into the .csv file (output I receive is 4 x tfidf).

NerdioN Over a year ago

Yeah, maybe you just have to tweak your function so that it returns the expected result. Thanks a lot for accepting

user13914826 · Accepted Answer · 2020-07-14 16:18:26Z

0

Well, when you ran get_top_n_bigram_Group2, you stored df2 into common_words.

answered Jul 14, 2020 at 16:18

user13914826

Collectives™ on Stack Overflow

Pandas - NameError: name 'df2' is not defined

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related