How do I count specific values across multiple columns in pandas

Question

I have the DataFrame

df = pd.DataFrame({
    'colA':['?',2,3,4,'?'],
    'colB':[1,2,'?',3,4],
    'colC':['?',2,3,4,5]
})

I would like to get the count the the number of '?' in each column and return the following output -

colA - 2
colB - 1
colC - 1

Is there a way to return this output at once. Right now the only way I know how to do it is write a for loop for each column.

score 10 · Accepted Answer · 2020-07-26 17:38:40Z

10

looks like the simple way is

df[df == '?'].count()

the result is

colA    2
colB    1
colC    1
dtype: int64

where df[df == '?'] give us DataFrame with ? and Nan

  colA colB colC
0    ?  NaN    ?
1  NaN  NaN  NaN
2  NaN    ?  NaN
3  NaN  NaN  NaN
4    ?  NaN  NaN

and the count non-NA cells for each column.

Please, look on the other solutions: good readable and the most faster

edited Jul 26, 2020 at 17:38

answered Jul 26, 2020 at 17:01

user8060120

Sign up to request clarification or add additional context in comments.

3 Comments

Danail Petrov Over a year ago

Nice solution. Just wonder what'll be the best way to print all rows/columns where '?' exist? (omitting all the NaN ones)

user8060120 Over a year ago

I tried add the dropna but it works slowly on my laptop, or what you mean?

Danail Petrov Over a year ago

dropna() is definitely an option, but it will drop any row/column that contains any NaN and you won't get any response in this particular case.

Ch3steR · Accepted Answer · 2021-04-21 08:45:35Z

3

You can use numpy.count_nonzero here.

pd.Series(np.count_nonzero(df.to_numpy()=='?', axis=0), index=df.columns)
# pd.Series((df.values == '?').sum(0), index=df.columns)

colA    2
colB    1
colC    1
dtype: int64

Timeit results:

Benchmarking with df of shape (1_000_000, 3)

big_df = pd.DataFrame(df.to_numpy().repeat(200_000,axis=0))
big_df.shape
(1000000, 3)

In [186]: %timeit pd.Series(np.count_nonzero(big_df.to_numpy()=='?', axis=0), index=big_df.columns)
53.1 ms ± 231 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [187]: %timeit big_df.eq('?').sum()
171 ms ± 7.42 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [188]: %timeit big_df[big_df == '?'].count()
314 ms ± 4.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [189]: %timeit pd.Series(np.apply_along_axis(lambda x: Counter(x)['?'], 0, big_df.values), index=big_df.columns)
174 ms ± 3.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

edited Apr 21, 2021 at 8:45

answered Jul 26, 2020 at 17:26

Ch3steR

20.8k4 gold badges34 silver badges66 bronze badges

1 Comment

user8060120 Over a year ago

great solution on my machine it is the faster way of all

BENY · Accepted Answer · 2020-07-26 17:15:32Z

2

We can do sum

df.eq('?').sum()
Out[182]: 
colA    2
colB    1
colC    1
dtype: int64

answered Jul 26, 2020 at 17:15

BENY

324k22 gold badges176 silver badges250 bronze badges

Comments

Ch3steR · Accepted Answer · 2020-07-26 17:16:55Z

0

@Bear Brown Answer is probably the most elegant, a faster option is to use numpy:

from collections import Counter    

%%timeit
df[df == '?'].count()

5.2 ms ± 646 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
pd.Series(np.apply_along_axis(lambda x: Counter(x)['?'], 0, df.values), index=df.columns)

218 µs ± 19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

edited Jul 26, 2020 at 17:16

Ch3steR

20.8k4 gold badges34 silver badges66 bronze badges

answered Jul 26, 2020 at 17:14

Ezer K

3,7615 gold badges25 silver badges50 bronze badges

Comments

Mishal Ahmed · Accepted Answer · 2021-06-04 14:09:24Z

0

Variation to BENY's answer:

(df=='?').sum()
Out[182]: 
colA    2
colB    1
colC    1
dtype: int64

answered Jun 4, 2021 at 14:09

Mishal Ahmed

1913 silver badges13 bronze badges

Collectives™ on Stack Overflow

How do I count specific values across multiple columns in pandas

5 Answers 5

3 Comments

1 Comment

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

3 Comments

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related