Pandas: count some values in column

Question

I have a dataframe like:

For each ID, I need the maximum number of times 0 appears in a row. In this case, since 0 appears thrice in a row for ID 111 and twice in a row for 222, the desired output should be:

ID   count_max_0
111    3
222    2

value_counts does not do what I want since it counts all values in the column.

How can I do that?

Ah, probably by starting to write some code. Right now it sounds like you just dropped your requirements here; without showing us what you tried so far. And that is rarely a good idea. — GhostCat
– GhostCat, Commented Jan 17, 2017 at 13:52
@EdChum I need to count quantity of zeros, that are go straight. it's match with sum() — Petr Petrov
– Petr Petrov, Commented Jan 17, 2017 at 13:56
@wwl I mean count max number of 0 that looks like 0 0 0 0 0. The max number of only 0 — Petr Petrov
– Petr Petrov, Commented Jan 17, 2017 at 14:05

unutbu · Accepted Answer · 2017-01-17 14:33:39Z

You could use

iszero = (df['value']==0)
df['group'] = (iszero.diff()==1).cumsum()

to assign a group number of each row:

In [115]: df
Out[115]: 
     ID  value  group
0   111      1      0
1   111      0      1
2   111      1      2
3   111      0      3
4   111      0      3
5   111      0      3
6   111      1      4
7   222      1      4
8   222      0      5
9   222      0      5
10  222      1      6

Now you can group by ID and group number to obtain the desired value counts:

import pandas as pd

df = pd.DataFrame({'ID': [111, 111, 111, 111, 111, 111, 111, 222, 222, 222, 222],
 'value': [1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1]})
iszero = (df['value']==0)
df['group'] = (iszero.diff()==1).cumsum()

counts = (df.loc[iszero]             # restrict to rows which have 0 value
          .groupby('ID')['group']    # group by ID, inspect the group column
          .value_counts()            # count the number of 0s for each (ID, group)
          .groupby(level='ID')       # group by ID only
          .first())                  # select the first (and highest) value count

print(counts)

yields

ID
111    3
222    2
Name: group, dtype: int64

David Simic · Accepted Answer · 2017-01-17 14:38:24Z

This should work:

import numpy as np

# load data etc
...

def get_count_max_0(df):
    """
    Computes the max length of a sequence of zeroes
    broken by ones.
    """
    values = np.array(df['value'].tolist())
    # compute change points where 0 -> 1
    cps_1 = np.where(
        (values[1:] != values[:-1]) &
        (values[1:] == 1)
    )[0]
    # compute change points where 1 -> 0
    cps_0 = np.where(
        (values[1:] != values[:-1]) &
        (values[1:] == 0)
    )[0]

    # find lengths of zero chains
    deltas = cps_1 - cps_0
    # get index of max length
    idx = np.where(deltas == deltas.max())[0][0]
    # return max length
    return deltas[idx]

# group by ID, apply get_count_max_0 to each group and 
# convert resulting series back to data frame to match your expected output.
max_counts = df.groupby("ID").apply(get_count_max_0).to_frame("count_max_0")

print(max_counts)

The output is:

     count_max_0
ID              
111            3
222            2

Tagc · Accepted Answer · 2017-01-17 15:59:29Z

0

aggregations = {
    'value': {
        'total': 'sum'
    }
}
dftwo = df.groupby('ID').agg(aggregations)

dataframe

edited Jan 17, 2017 at 15:59

Tagc

9,1409 gold badges68 silver badges119 bronze badges

answered Jan 17, 2017 at 15:21

lasingallday

631 silver badge8 bronze badges

Collectives™ on Stack Overflow

Pandas: count some values in column

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related