For a current project, I am running a number of iterations of a Pandas DataFrame and plan to print the variable df2.
When calling the line print(df2), I am however getting the error NameError: name 'df2' is not defined. I have already been checking for solutions but have not found anything yet. Is there any smart tweak to make this run?
The corresponding code section looks like this:
# Open the file to write to
with open('sp500-1.csv', 'w', newline='') as file:
writer = csv.writer(file)
# Write headers
writer.writerow(["Section", "TFI"])
# Loop over the JSON objects
for i in ['txt_pro','txt_con','txt_adviceMgmt','txt_main']:
# Loop over the common words inside the JSON object
common_words = get_top_n_bigram_Group2(df[i], 500)
for word in common_words:
# Print and write row.
print(df2)
writer.writerow([df2])
And the code that defines df2 is as follows:
def get_top_n_bigram_Group2(corpus, n=None):
# settings that you use for count vectorizer will go here
tfidf_vectorizer=TfidfVectorizer(ngram_range=(2, 2), stop_words='english', use_idf=True).fit(corpus)
# just send in all your docs here
tfidf_vectorizer_vectors=tfidf_vectorizer.fit_transform(corpus)
# get the first vector out (for the first document)
first_vector_tfidfvectorizer=tfidf_vectorizer_vectors[0]
# place tf-idf values in a pandas data frame
df1 = pd.DataFrame(first_vector_tfidfvectorizer.T.todense(), index=tfidf_vectorizer.get_feature_names(), columns=["tfidf"])
df2 = df1.sort_values(by=["tfidf"],ascending=False)
return df2
df2in the global scope (i.e., outside of the functionget_top_n_bigram_Group2). You name the output ofget_top_n_bigram_Group2common_words, so you should useprint(common_words). The namedf2only exists within the function.common_wordswill in this case yield thedf2variable that has been defined indef get_top_n_bigram_Group2.to_csvmethod that will probably make your life much easier