3

I have a pandas dataframe like below with column name 'texts'

texts
throne one
bar one
foo two
bar three
foo two
bar two
foo one
foo three
one three

I want to count presence of three words 'one' and 'two' and 'three' for each row and return count of matches for these words, if it is a complete word. Output will look like below.

    texts   counts
    throne one  1
    bar one     1
    foo two     1
    bar three   1
    foo two     1
    bar two     1
    foo one     1
    foo three   1
    one three   2

you can see than for the first row, count is 1 as 'throne' was not considered as one of the value being searched 'one' is not a complete word and instead it is 'throne'.

Any help on this?

1
  • @MattR one three has count 2 as i am originally searching for count of one, two and three. in the last row, both the values are present. So it is rightly giving count of 2 Commented Apr 5, 2018 at 15:58

1 Answer 1

8

Use pd.Series.str.count with a regex by joining words with '|'

words = 'one two three'.split()

df.assign(counts=df.texts.str.count('|'.join(words)))

        texts  counts
0  throne one       2
1     bar one       1
2     foo two       1
3   bar three       1
4     foo two       1
5     bar two       1
6     foo one       1
7   foo three       1
8   one three       2

In order to nail down 'throne', as in don't count it, we can add some word boundaries to the regex

words = 'one two three'.split()

df.assign(counts=df.texts.str.count('|'.join(map(r'\b{}\b'.format, words))))

        texts  counts
0  throne one       1
1     bar one       1
2     foo two       1
3   bar three       1
4     foo two       1
5     bar two       1
6     foo one       1
7   foo three       1
8   one three       2

And for flair, using the raw form of f-strings in Python 3.6

words = 'one two three'.split()

df.assign(counts=df.texts.str.count('|'.join(fr'\b{w}\b' for w in words)))

        texts  counts
0  throne one       1
1     bar one       1
2     foo two       1
3   bar three       1
4     foo two       1
5     bar two       1
6     foo one       1
7   foo three       1
8   one three       2
Sign up to request clarification or add additional context in comments.

2 Comments

first row shouldn't give count of 2 as i am only looking for complete words. throne is a different word and cannot be considered for one
Thank you very much. It did exactly what i requested for. I will accept this as answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.