Python: In a DataFrame, how do I loop through all strings of one column and check to see if they appear in another column and count them?

Question

I've got a dataframe and want to loop through all cells within column c2 and count how many times each entire string appears in another column c1, if it exists. Then print the results.

Example df:

id     c1                c2
0      luke skywalker    han solo
1      leia organa       r2d2
2      darth vader       finn
3      han solo          the emporer
4      han solo          c3po
5      finn              leia organa
6      r2d2              darth vader

Example printed result:

han solo      2
r2d2          1
finn          1
the emporer   0
c3po          0
leia organa   1
darth vader   1

I'm using Jupyter notebook with python and pandas. Thanks!

I have NaN values in c2 which changes some of the solutions below as indicated by @wen. — mapk
– mapk, Commented Feb 16, 2018 at 3:45

piRSquared · Accepted Answer · 2018-02-15 23:35:21Z

3

You can use some Numpy magic.
Use count and broadcasting to compare each combination.

from numpy.core.defchararray import count

c1 = df.c1.values.astype(str)
c2 = df.c2.values.astype(str)

pd.Series(
    count(c1, c2[:, None]).sum(1),
    c2
)

han solo       2
r2d2           1
finn           1
the emporer    0
c3po           0
leia organa    1
darth vader    1
dtype: int64

answered Feb 15, 2018 at 23:35

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

roganjosh Over a year ago

Out of curiosity, do you know what the "def" stands for? Searching for numpy.core.defchararray throws up all the methods, numpy.core.chararray throws up all those methods, and searching for the difference is obscured by the fact that this is a thing :(

piRSquared Over a year ago

@roganjosh hah! No, you've got me there. I have no idea.

mapk Over a year ago

I have NaN values in c2. Is there a way to remove them AFTER the pd.Series is created with this method? I can't remove the rows before because those rows in c1 could contain the strings for which I'm trying to match.

BENY · Accepted Answer · 2018-02-15 23:43:29Z

2

You can pass them as category and using value_counts

df.c1.astype('category',categories=df.c2.tolist()).value_counts(sort=False)
Out[572]: 
han solo       2
r2d2           1
finn           1
the emporer    0
c3po           0
leia organa    1
darth vader    1
Name: c1, dtype: int64

Or you can do

pd.crosstab(df.c2,df.c1).sum().reindex(df.c2,fill_value=0)
Out[592]: 
c2
han solo       2
r2d2           1
finn           1
the emporer    0
c3po           0
leia organa    1
darth vader    1

edited Feb 15, 2018 at 23:43

answered Feb 15, 2018 at 23:34

BENY

324k22 gold badges176 silver badges250 bronze badges

6 Comments

mapk Over a year ago

When i try the 'category' option I get an error: ValueError: Categorial categories cannot be null

BENY Over a year ago

@mapk it works fine on my side do you have nan in c2?

mapk Over a year ago

Also on the second option, the fill_value=0 seems to just fill them all with 0.

BENY Over a year ago

@mapk if you have nan then it is totally different story

mapk Over a year ago

I understand. Can you help me through it?

|

ggp · Accepted Answer · 2018-02-16 00:22:44Z

0

df[c3] = pd.Series([df[c1].count(n) for n in df[c2]])

edited Feb 16, 2018 at 0:22

answered Feb 15, 2018 at 23:53

ggp

11 bronze badge

1 Comment

roganjosh Over a year ago

This is a syntax error since you don't complete your list comprehension. It's also not a good use of pandas at all since you create a whole new DataFrame rather than work with the one that exists. Two good answers were posted 20 mins prior to this; it would be worth reading those and understanding how they used the libraries before posting an answer.

Collectives™ on Stack Overflow

Python: In a DataFrame, how do I loop through all strings of one column and check to see if they appear in another column and count them?

3 Answers 3

3 Comments

6 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

6 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related