Count Specific Word Across Multiple Columns in Pandas Dataframes, Output Grouped by Column

Question

Trying to find a way to do a sum of all the columns (there are around 7) with the criteria being 1 word? For example, across all the columns of Name, Fruit, Country, etc. I want to know how many times the word 'the' appears in each one.

I can use this df3['Name'].str.count('The').sum(), and that will give this result:

Out[121]: 3522

But then when I add in the next string field so that it is

df3['Name'].str.count('The').sum()
df3['Fruit'].str.count('The').sum()

it only shows the last syntax (as expected):

Out[122]:27

What I obviously want is for it to say:

Name: 3522
Fruit: 27

But I don't seem to be able to use str.count or str.contains in a way that groups it like I need.

If the data is something like the following:

Name | Year | Score | 2nd Score | % of People | Country | Fruit | Export Countries | Language | Transit Duration | Quality | Taste | Freshness | Packaging
Andes, The | 2021 | 8 | 8.8 | 87% | The Netherlands | The Apple | United States,United Kingdom | English,Japanese,French | 148.0 | 1.0 | 0.0 | 0.0 | 0.0
Phil | 2021 | 8 | 8.4 | 87% | Spain | The Banana | United Kingdom, Germany | English,German,French,Italian | 165.0 | 1.0 | 0.0 | 0.0 | 0.0
Sarah | 2021 | 9 | 8.3 | 89% | Greece | The Plum | Germany,United States | English,German,French,Italian | 153.0 | 1.0 | 0.0 | 0.0 | 0.0

The expected output should be

Name: 1
Year: 0
Score: 0
2ndScore: 0
Country: 1
Fruit: 3
TransitDuration: 0
Quality: 0
Taste: 0
Freshness: 0
Packaging: 0

Have made an edit to the original post; is that what you're talking about? — Aemonar
– Aemonar, Commented Aug 15, 2021 at 10:45

sammywemmy · Accepted Answer · 2021-08-15 11:20:14Z

1

You could use applymap to get your output; it hits every cell:

In [477]: df.applymap(lambda df: 'The' in df).sum()
Out[477]: 
Name        1
 Fruit      2
 Country    1
dtype: int64

The first part, which is the applymap, returns a series of booleans for each cell in each column:

In [476]: df.applymap(lambda df: 'The' in df)
Out[476]: 
   Name    Fruit    Country
0   True     True      True
1  False     True     False

From here, you can sum the booleans, which is just 1s and 0s

You could use the transform function, or apply to replicate the same result :

 df.transform(lambda df: df.str.contains('The')).sum()
Out[482]: 
Name        1
 Fruit      2
 Country    1
dtype: int64

Based on your comments, you could select only text columns, with the select_dtypes method:

In [483]: df.select_dtypes('object').applymap(lambda df: 'The' in df).sum()
Out[483]: 
Name        1
 Fruit      2
 Country    1
dtype: int64

Thanks to @Shubhamsharma, the solution below works:

 df.astype(str).applymap(lambda s: 'The' in s).sum()

edited Aug 15, 2021 at 11:20

answered Aug 15, 2021 at 10:47

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Aemonar Over a year ago

That would be absolutely perfect based on all the info given originally, however have realised that a catch all will not work because a couple of the columns are int and so this threw an error. Does your Syntax allow for nominating specific columns (i.e. ones that aren't int)?

Aemonar Over a year ago

Neither of those allow me to run it. The first df.applymap(lambda df: 'The' in df) gives me the error: argument of type 'int' is not iterable running the second df.transform(lambda df: df.str.contains('The')).sum(), I firstly get:

AttributeError: Can only use .str accessor with string values! During handling of the above exception, another exception occurred:

and then get the error down the bottom that is ValueError: Transform function failed. Have edited original to include int value

sammywemmy Over a year ago

did you apply the select_types? df.select_dtypes('object').applymap(lambda df: 'The' in df).sum()

Shubham Sharma Over a year ago

@Aemonar Check df.astype(str).applymap(lambda s: 'The' in s).sum()

Aemonar Over a year ago

df.astype(str).applymap(lambda s: 'The' in s).sum() absolutely worked @ShubhamSharma, thank you! @sammywemmy - I edited the original post to include the new info, but the above has worked for me.

|

Collectives™ on Stack Overflow

Count Specific Word Across Multiple Columns in Pandas Dataframes, Output Grouped by Column

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related