0

I'm a complete newbie at pandas so a simpler (though maybe not the most efficient or elegant) solution is appreciated. I don't mind a bit of brute force if I can understand the answer better.

If I have the following Dataframe:

A    B    C 
0    0    1
0    1    1

I want to loop through columns "A", "B" and "C" in that order and during each iteration select all the rows for which the current column is "1" and none of the previous columns are and save the result and also use it in the next iteration.

So when looking at column A, I wouldn't select anything. Then when looking at column B I would select the second row because B==1 and A==0. Then when looking at column C I would select the first row because A==0 and B==0.

2
  • What is the expected output? Commented Jun 9, 2022 at 21:22
  • On the first iteration (column A) there's no output as nothing matches the criteria, on the second iteration (column B), the expected output is the second row: 0 0 1 and when looking at column C (the third iteration) the expected output is the first row: 0 0 1 Commented Jun 9, 2022 at 23:34

2 Answers 2

1

Create a boolean mask:

m = (df == 1) & (df.cumsum(axis=1) == 1)
d = {col: df[m[col]].index.tolist() for col in df.columns if m[col].sum()}

Output:

>>> m
       A      B      C
0  False  False   True
1  False   True  False
2  False  False   True

>>> d
{'B': [1], 'C': [0, 2]}

I slightly modified your dataframe:

>>> df
   A  B  C
0  0  0  1
1  0  1  1
2  0  0  1

Update

For the expected output on my sample:

for rows, col in zip(m, df.columns):
    if m[col].sum():
        print(f"\n=== {col} ===")
        print(df[m[col]])

Output:

=== B ===
   A  B  C
1  0  1  1

=== C ===
   A  B  C
0  0  0  1
2  0  0  1
Sign up to request clarification or add additional context in comments.

4 Comments

Oh, I see what you mean. Good catch :) +1
Does OP only want the index? in that case isnt df.idxmax(1) sufficient? ie df.idxmax(1).reset_index().groupby(0).agg(list). I am not quite sure of just the index or rather OP is trying to rearrange the df such that he has an upper triangle maxtrix
I updated my answer. Can you check it please?
@onyambu. I returned the index because the expected output is not clear. With index, it's easy to extract rows. I updated my answer with a simple print.
0

Seems like you need a direct use of idxmax

Return index of first occurrence of maximum over requested axis.

NA/null values are excluded.


>>> df.idxmax()
A    0
B    1
C    0
dtype: int64

The values above are the indexes for which your constraints are met. 1 for B means that the second row was "selected". 0 for C, same. The only issue is that, if nothing is found, it'll also return 0.

To address that, you can use where

>>> df.idxmax().where(~df.eq(0).all())

This will make sure that NaNs are returned for all-zero columns.

A    NaN
B    1.0
C    0.0
dtype: float64

4 Comments

Yes. I didn't see it. Sorry
There is a subtlety: select all the rows. idxmax returns only one row (index)
@Corralien not sure if I follow. I believe OP wants the idxmax indeed. What they meant was to find that for all columns.
For example, if there are 2 rows with 1 in C and all previous columns are set to 0 then you have to select the 2 rows not the first. Check my answer, you will understand ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.