1

I have a dataframe like:

ID   value
111   1
111   0
111   1
111   0
111   0
111   0
111   1
222   1
222   0
222   0
222   1

For each ID, I need the maximum number of times 0 appears in a row. In this case, since 0 appears thrice in a row for ID 111 and twice in a row for 222, the desired output should be:

ID   count_max_0
111    3
222    2

value_counts does not do what I want since it counts all values in the column.

How can I do that?

9
  • 5
    Ah, probably by starting to write some code. Right now it sounds like you just dropped your requirements here; without showing us what you tried so far. And that is rarely a good idea. Commented Jan 17, 2017 at 13:52
  • @EdChum I need to count quantity of zeros, that are go straight. it's match with sum() Commented Jan 17, 2017 at 13:56
  • What is "go straight"? Commented Jan 17, 2017 at 14:00
  • 1
    Do you mean in a row? Commented Jan 17, 2017 at 14:01
  • @wwl I mean count max number of 0 that looks like 0 0 0 0 0. The max number of only 0 Commented Jan 17, 2017 at 14:05

3 Answers 3

2

You could use

iszero = (df['value']==0)
df['group'] = (iszero.diff()==1).cumsum()

to assign a group number of each row:

In [115]: df
Out[115]: 
     ID  value  group
0   111      1      0
1   111      0      1
2   111      1      2
3   111      0      3
4   111      0      3
5   111      0      3
6   111      1      4
7   222      1      4
8   222      0      5
9   222      0      5
10  222      1      6

Now you can group by ID and group number to obtain the desired value counts:

import pandas as pd

df = pd.DataFrame({'ID': [111, 111, 111, 111, 111, 111, 111, 222, 222, 222, 222],
 'value': [1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1]})
iszero = (df['value']==0)
df['group'] = (iszero.diff()==1).cumsum()

counts = (df.loc[iszero]             # restrict to rows which have 0 value
          .groupby('ID')['group']    # group by ID, inspect the group column
          .value_counts()            # count the number of 0s for each (ID, group)
          .groupby(level='ID')       # group by ID only
          .first())                  # select the first (and highest) value count

print(counts)

yields

ID
111    3
222    2
Name: group, dtype: int64
Sign up to request clarification or add additional context in comments.

Comments

0

This should work:

import numpy as np

# load data etc
...

def get_count_max_0(df):
    """
    Computes the max length of a sequence of zeroes
    broken by ones.
    """
    values = np.array(df['value'].tolist())
    # compute change points where 0 -> 1
    cps_1 = np.where(
        (values[1:] != values[:-1]) &
        (values[1:] == 1)
    )[0]
    # compute change points where 1 -> 0
    cps_0 = np.where(
        (values[1:] != values[:-1]) &
        (values[1:] == 0)
    )[0]

    # find lengths of zero chains
    deltas = cps_1 - cps_0
    # get index of max length
    idx = np.where(deltas == deltas.max())[0][0]
    # return max length
    return deltas[idx]

# group by ID, apply get_count_max_0 to each group and 
# convert resulting series back to data frame to match your expected output.
max_counts = df.groupby("ID").apply(get_count_max_0).to_frame("count_max_0")

print(max_counts)

The output is:

     count_max_0
ID              
111            3
222            2

Comments

0
aggregations = {
    'value': {
        'total': 'sum'
    }
}
dftwo = df.groupby('ID').agg(aggregations)

dataframe

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.