Output of for loop filling down in dataframe instead of returning corresponding values for each row

Question

I'm using SpaCy to process a series of sentences and return the five most common words in each sentence. My goal is to store the output of that frequency analysis (using Counter) in a column beside each corresponding sentence. I think this is just the lack of coffee and sleep talking here, but I'm stuck on why this keeps outputting a dataframe that has the first value filling all the way down (and repeating) instead of unique values that match the output for the sentence itself.

Code:

# test_data is a Dataframe with three columns: a unique identifier, a title, and a sentence for each title. #

for value in test_data['desc']: # for each sentence in dataset
    desc = nlp(value) # run spacy natural language processing on the description
    words = [
        token.text # for each token, etc
        for token in desc
        if not token.is_stop and not token.is_punct # essentially, just keywords, no filler
    ]
    keys = list(Counter(words).most_common(5)) # store values from Counter 
    key_list = ", ".join(map(str, keys)) # convert list to string
    test_data['key'] = key_list # carry list over to dataframe

The output I'm getting is something like:

uniq	title	desc	key
1	Title one...	Sentence one..	('kword1', 12), ('kword2', 8), ('kword3', 7)
2	Title two...	Sentence two...	('kword1', 12), ('kword2', 8), ('kword3', 7)
3	Title three...	Sentence three...	('kword1', 12), ('kword2', 8), ('kword3', 7)
4	Title four ...	Sentence four...	('kword1', 12), ('kword2', 8), ('kword3', 7)

Where kword1, 2 and 3 all are perfect for the first row (eg, it's the correct output for Sentence One), but duplicated across all rows filling down (not the correct output for Sentence two, three, four).

I'm not sure if this makes any sense and I'm a bit of a Python novice without a comp sci background/foundation so I am all ears for help. Thank you in advance!!

Corralien · Accepted Answer · 2025-10-28 11:17:19Z

2

Your mistake is here:

test_data['key'] = key_list

You rewrite the entire column on each iteration.

You can use a function and let Pandas create the columns :

def count5(row):
    desc = nlp(row)
    words = [token.text for token in desc  if not token.is_stop and not token.is_punct]
    keys = list(Counter(words).most_common(5))
    key_list = ", ".join(map(str, keys))
    return key_list
    
test_data["key"] = test_data["desc"].map(count5)

Output:

>>> test_data
                                                desc                                                key
0  Recent years have brought a revolution in the ...  ('languages', 2), ('Recent', 1), ('years', 1),...
1  The latest AI models are unlocking these areas...  ('latest', 1), ('AI', 1), ('models', 1), ('unl...
2  The examples of NLP use cases in everyday live...  ('examples', 1), ('NLP', 1), ('use', 1), ('cas...
3  Natural language processing algorithms emphasi...  ('Natural', 1), ('language', 1), ('processing'...
4  The outline of NLP examples in real world for ...  ('translation', 3), ('outline', 1), ('NLP', 1)...

answered Oct 28 at 11:17

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

cmr Oct 28 at 14:49

Oh, duh duh duh duh duh duh. Thank you @Corralien!!

Collectives™ on Stack Overflow

Output of for loop filling down in dataframe instead of returning corresponding values for each row

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related