0

I have a data set like this:

index sentence
1     bobby went to the gym
1     sally the bad
1     days are good
2     sunny side up
2     the weird

I want to count how many times 'the' appears in the columns 'sentence' by index:

index count_the 
1     2
2     1

how would I do this in pandas?

4 Answers 4

1

First groupby.Series.apply, then use series.str.count:

df = df.groupby('index').sentence.apply(' '.join).reset_index()

print(df)
   index                                           sentence
0      1  bobby went to the gym sally the bad days are good
1      2                            sunny side up the weird

df['count_the'] = df.sentence.str.count('the')

print(df.drop(['sentence'],axis=1))
   index  count_the
0      1          2
1      2          1
Sign up to request clarification or add additional context in comments.

Comments

1

one way from findall , notice I treat the index columns as index here

df.sentence.str.findall(r'\bthe\b').str.len().sum(level=0)
Out[363]: 
index
1    2
2    1
Name: sentence, dtype: int64

1 Comment

Upvote - though it is not the straight forward answer but it did solve my problem where I was searching a regex instead of a constant string and IMHO the .sum(level=0) is probably unnecessary.
0
df = pd.DataFrame({'index' :[1,1,1,2,2],'sentence':['bobby went to the gym','sally the bad','days are good','sunny side up','the weird']})
df['counts'] = df['sentence'].str.count('the')
print(df.groupby('index')['counts'].sum())

Comments

0

Also you can use groupby()+ apply():

df.groupby('index').apply(lambda x: x['sentence'].str.contains(r'.*the').sum()).reset_index(name = 'count_the')

or groupby()+ apply():

df.groupby('index').agg({'sentence': lambda x: x.str.contains(r'.*the').sum()}).reset_index(name = 'count_the')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.