Python Logical Operations as conditions in Pandas

Question

I have a dataframe with columns:

import pandas as pd
import numpy as np
df = pd.DataFrame({
    'A': [False, True, False, False, False, False, True, True, False, True],
    'B': [True, False, False, False, True, True, False, False, False, False ]
})

df

      A      B
0   False   True
1   True    False
2   False   False
3   False   False
4   False   True
5   False   True
6   True    False
7   True    False
8   False   False
9   True    False

How to identify and mark the first occurrence that has [True - False] after encountering a [False - False] value pair? Every row that satisfies this condition needs to be flagged in a new column.

In the example above, [3 False False] is followed by [6 True False] and also, [8 False False] is followed by [9 True False].

These are the only valid solutions in this example.

@MichaelButscher that would only work if the FF / TF are successive, which doesn't seem to be the case here (e.g., 3->6) — mozway
– mozway, Commented Apr 3, 2024 at 17:40
@mozway I should have looked closer at the question, thanks. — Michael Butscher
– Michael Butscher, Commented Apr 3, 2024 at 20:52

mozway · Accepted Answer · 2024-04-04 02:35:07Z

4

You could use:

# identify start of group
m1 = df.eq([False, False]).all(axis=1)
# condition
m2 = df.eq([True, False]).all(axis=1)
# form groups
group = m1.cumsum()

# keep only rows with valid condition and after a start of group
# get the first value per group
idx = m2[m2 & (group>0)].groupby(group).idxmax().tolist()

# variant
# idx = m2.index.to_series()[m2 & (group>0)].groupby(group).first().tolist()

# assign flag
df.loc[idx, 'flag'] = 'X'

Output:

       A      B flag
0  False   True  NaN
1   True  False  NaN
2  False  False  NaN
3  False  False  NaN
4  False   True  NaN
5  False   True  NaN
6   True  False    X
7   True  False  NaN
8  False  False  NaN
9   True  False    X

Intermediates:

       A      B     m1     m2  group flag
0  False   True  False  False      0     
1   True  False  False   True      0     
2  False  False   True  False      1     
3  False  False   True  False      2     
4  False   True  False  False      2     
5  False   True  False  False      2     
6   True  False  False   True      2    X
7   True  False  False   True      2     
8  False  False   True  False      3     
9   True  False  False   True      3    X

Variant without groupby:

# identify start of groups
m1 = df.eq([False, False]).all(axis=1)
# condition
m2 = (df.eq([True, False]).all(axis=1)
      & m1.cummax()
      )
# form groups
group = m1.cumsum()

idx = group[m2].drop_duplicates().index

# assign flag
df.loc[idx, 'flag'] = 'X'

edited Apr 4, 2024 at 2:35

answered Apr 3, 2024 at 17:36

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

prashanth manohar Over a year ago

I was wondering if it would be possible to use FSM here? There are only finite number of State transitions possible with these boolean variables.

mozway Over a year ago

@prashanthmanohar there are certainly other approaches, since you have a pandas DataFrame the approach above seems the most reasonable to me.

juanpa.arrivillaga Over a year ago

@prashanthmanohar a finite state machine?

prashanth manohar Over a year ago

Yes. TF, FT. FF, TT are the 4 possible inputs that can come in any order. after receiving input FF, when TF is eventually received, the FINAL STATE of FSM would be reached.

mozway Over a year ago

Of course, but this would mean using a loop with flags, no? If reaching FF allow to collect 1 row, upon reaching TF collect and return to off state. You could use numbat.jit to code it this way. NB. I added a groupby-less variant of my logic.

Collectives™ on Stack Overflow

Python Logical Operations as conditions in Pandas

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related