How to highlight and count specific keywords in a pandas dataframe

Question

For each row in the Text column of my df, I want to do the following:

Highlight the keywords gross,suck,singing & ponzi
Count the number of keywords in each row and store them in a Count column

import pandas as pd

data = {'Text': ['The bread tastes good','Tuna is gross','Teddy is a beach bum','Angela suck at singing!','oneCoin was a ponzi scheme'],
        'ID': [1001,1002,1003,1004,1005]
        }

df = pd.DataFrame(data, columns = ['ID', 'Text'])

print(df)

The desired output should include the Count column and look like this :

My attempt (not the best! you can ignore this):

# keyword list
key_words = ['gross','suck','singing','ponzi']

# highlight the keywords
df['Text'].applymap(lambda x: "background-color: yellow" if x else "")

# count the keywords present in each row

df['Count'] = df['Text'].str.count(r"\b(?:{})\b".format("|".join(key_words)))

All attempts highly appreciated!

df['Count'] = df['Text'].str.count(r"\b(?:{})\b".format("|".join(key_words)))? — Wiktor Stribiżew
– Wiktor Stribiżew, Commented May 28, 2021 at 22:31
@WiktorStribiżew- Thanks, that part works fine! what about flagging the key_words? — RayX500
– RayX500, Commented May 28, 2021 at 22:37
Where do you need to highlight them? In a Linux terminal? In Jupyter notebook? — Wiktor Stribiżew
– Wiktor Stribiżew, Commented May 28, 2021 at 22:42

Ryszard Czech · Accepted Answer · 2021-05-28 22:48:46Z

1

Use Series.str.count:

>>> df['Text'].str.count(fr"\b(?:{'|'.join(key_words)})\b")
0    0
1    1
2    0
3    2
4    1
Name: Text, dtype: int64

\b is a word boundary, you can get whole word count with it.

You can't highlight separate words in Jupyter notebook. You can extract the words into a separate column:

df['Matches'] = df['Text'].str.findall(fr"\b(?:{'|'.join(key_words)})\b")

answered May 28, 2021 at 22:48

Ryszard Czech

18.7k4 gold badges27 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

RayX500 Over a year ago

thanks, what about the highlighting part? or can we create a new column Keyword_Present like in the figure above.

Ryszard Czech Over a year ago

@RickyTricky Sorry, no highlighting. df['Keyword_Present'] = df['Text'].str.findall(fr"\b(?:{'|'.join(key_words)})\b").str.join(' ') can be used instead.

wwnde · Accepted Answer · 2021-05-28 22:34:53Z

1

Use str, find all. That will give you a list. count elements in each list using str.len()

df['count']=df['Text'].str.findall('|'.join(key_words)).str.len()
df

answered May 28, 2021 at 22:34

wwnde

26.7k6 gold badges22 silver badges38 bronze badges

1 Comment

RayX500 Over a year ago

Thanks, any luck with the highlighting part of the question?

Collectives™ on Stack Overflow

How to highlight and count specific keywords in a pandas dataframe

2 Answers 2

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related