1

For each row in the Text column of my df, I want to do the following:

  1. Highlight the keywords gross,suck,singing & ponzi

  2. Count the number of keywords in each row and store them in a Count column

import pandas as pd

data = {'Text': ['The bread tastes good','Tuna is gross','Teddy is a beach bum','Angela suck at singing!','oneCoin was a ponzi scheme'],
        'ID': [1001,1002,1003,1004,1005]
        }

df = pd.DataFrame(data, columns = ['ID', 'Text'])

print(df)


The desired output should include the Count column and look like this :

enter image description here

My attempt (not the best! you can ignore this):

# keyword list
key_words = ['gross','suck','singing','ponzi']

# highlight the keywords
df['Text'].applymap(lambda x: "background-color: yellow" if x else "")

# count the keywords present in each row

df['Count'] = df['Text'].str.count(r"\b(?:{})\b".format("|".join(key_words)))


All attempts highly appreciated!

7
  • df['Count'] = df['Text'].str.count(r"\b(?:{})\b".format("|".join(key_words)))? Commented May 28, 2021 at 22:31
  • @WiktorStribiżew- Thanks, that part works fine! what about flagging the key_words? Commented May 28, 2021 at 22:37
  • Where do you need to highlight them? In a Linux terminal? In Jupyter notebook? Commented May 28, 2021 at 22:42
  • @WiktorStribiżew, Jupyter notebook or export as csv file? Commented May 28, 2021 at 22:43
  • 1
    It looks like it is impossible. Commented May 28, 2021 at 22:46

2 Answers 2

1

Use Series.str.count:

>>> df['Text'].str.count(fr"\b(?:{'|'.join(key_words)})\b")
0    0
1    1
2    0
3    2
4    1
Name: Text, dtype: int64

\b is a word boundary, you can get whole word count with it.

You can't highlight separate words in Jupyter notebook. You can extract the words into a separate column:

df['Matches'] = df['Text'].str.findall(fr"\b(?:{'|'.join(key_words)})\b")
Sign up to request clarification or add additional context in comments.

2 Comments

thanks, what about the highlighting part? or can we create a new column Keyword_Present like in the figure above.
@RickyTricky Sorry, no highlighting. df['Keyword_Present'] = df['Text'].str.findall(fr"\b(?:{'|'.join(key_words)})\b").str.join(' ') can be used instead.
1

Use str, find all. That will give you a list. count elements in each list using str.len()

df['count']=df['Text'].str.findall('|'.join(key_words)).str.len()
df

1 Comment

Thanks, any luck with the highlighting part of the question?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.