1

I have a dataframe with columns:

import pandas as pd
import numpy as np
df = pd.DataFrame({
    'A': [False, True, False, False, False, False, True, True, False, True],
    'B': [True, False, False, False, True, True, False, False, False, False ]
})

df

      A      B
0   False   True
1   True    False
2   False   False
3   False   False
4   False   True
5   False   True
6   True    False
7   True    False
8   False   False
9   True    False

How to identify and mark the first occurrence that has [True - False] after encountering a [False - False] value pair? Every row that satisfies this condition needs to be flagged in a new column.

In the example above, [3 False False] is followed by [6 True False] and also, [8 False False] is followed by [9 True False].

These are the only valid solutions in this example.

2
  • 1
    @MichaelButscher that would only work if the FF / TF are successive, which doesn't seem to be the case here (e.g., 3->6) Commented Apr 3, 2024 at 17:40
  • 1
    @mozway I should have looked closer at the question, thanks. Commented Apr 3, 2024 at 20:52

1 Answer 1

4

You could use:

# identify start of group
m1 = df.eq([False, False]).all(axis=1)
# condition
m2 = df.eq([True, False]).all(axis=1)
# form groups
group = m1.cumsum()

# keep only rows with valid condition and after a start of group
# get the first value per group
idx = m2[m2 & (group>0)].groupby(group).idxmax().tolist()

# variant
# idx = m2.index.to_series()[m2 & (group>0)].groupby(group).first().tolist()

# assign flag
df.loc[idx, 'flag'] = 'X'

Output:

       A      B flag
0  False   True  NaN
1   True  False  NaN
2  False  False  NaN
3  False  False  NaN
4  False   True  NaN
5  False   True  NaN
6   True  False    X
7   True  False  NaN
8  False  False  NaN
9   True  False    X

Intermediates:

       A      B     m1     m2  group flag
0  False   True  False  False      0     
1   True  False  False   True      0     
2  False  False   True  False      1     
3  False  False   True  False      2     
4  False   True  False  False      2     
5  False   True  False  False      2     
6   True  False  False   True      2    X
7   True  False  False   True      2     
8  False  False   True  False      3     
9   True  False  False   True      3    X

Variant without groupby:

# identify start of groups
m1 = df.eq([False, False]).all(axis=1)
# condition
m2 = (df.eq([True, False]).all(axis=1)
      & m1.cummax()
      )
# form groups
group = m1.cumsum()

idx = group[m2].drop_duplicates().index

# assign flag
df.loc[idx, 'flag'] = 'X'
Sign up to request clarification or add additional context in comments.

5 Comments

I was wondering if it would be possible to use FSM here? There are only finite number of State transitions possible with these boolean variables.
@prashanthmanohar there are certainly other approaches, since you have a pandas DataFrame the approach above seems the most reasonable to me.
@prashanthmanohar a finite state machine?
Yes. TF, FT. FF, TT are the 4 possible inputs that can come in any order. after receiving input FF, when TF is eventually received, the FINAL STATE of FSM would be reached.
Of course, but this would mean using a loop with flags, no? If reaching FF allow to collect 1 row, upon reaching TF collect and return to off state. You could use numbat.jit to code it this way. NB. I added a groupby-less variant of my logic.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.