How to get Index of first Row with none-zero minimum value in Pandas DataFrame?

Question

Assuming I have the following Pandas DataFrame:

     U     A         B
0  2000    10       20
1  3000    40        0 
2  2100    20       30
3  2500     0       30 
4  2600    30       40

How can I get the index of first row that both A and B have non-zero value and (A+B)/2 is larger than 15 ?

In this example, I would like to get 2 since it is the first row that have non-zero A and B column and avg value of 25 which is more than 15

Note that this DataFrame is huge, I am looking for the fastest way to the index value.

does this answer your question?

Liad Kehila
– Liad Kehila

2020-12-28 22:33:20 +00:00
Commented Dec 28, 2020 at 22:33 — Liad Kehila
– Liad Kehila, Commented Dec 28, 2020 at 22:33

jkr · Accepted Answer · 2020-12-28 22:50:54Z

5

Lets try:

 df[(df.A.ne(0)&df.B.ne(0))&((df.A+df.B)/2).gt(15)].first_valid_index()

edited Dec 28, 2020 at 22:50

jkr

19.6k5 gold badges48 silver badges78 bronze badges

answered Dec 28, 2020 at 22:35

wwnde

26.7k6 gold badges22 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ARH Over a year ago

What if there is NaN instead of 0 in one row ? Does it treat NaN similar to 0 ?

wwnde Over a year ago

Yap, NaN is treated as 0

Glauco · Accepted Answer · 2020-12-28 23:29:08Z

1

I find more readable explicit variables, like:

AB2 = (df['A']+df['B'])/2 
filter = (df['A'] != 0) & (df['B'] != 0) & (AB2>15)
your_index = df[filter].index[0]

Performance For this use case (ridiculous dataset)

%%timeit
df[(df.A.ne(0)&df.B.ne(0))&((df.A+df.B)/2).gt(15)].first_valid_index()
**1.21 ms** ± 35.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
AB2 = (df['A']+df['B'])/2 
filter = (df['A'].ne(0)) & (df['B'].ne(0)) & (AB2>15)
df[filter].index[0]
**1.08 ms** ± 28.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
df.query("A!=0 and B!=0 and (A+B)/2 > 15").index[0]
**2.71 ms** ± 157 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

edited Dec 28, 2020 at 23:29

answered Dec 28, 2020 at 22:52

Glauco

1,4982 gold badges11 silver badges21 bronze badges

Comments

sammywemmy · Accepted Answer · 2020-12-28 23:21:37Z

0

If the dataframe is large, query might be faster:

df.query("A!=0 and B!=0 and (A+B)/2 > 15").index[0]

      2

answered Dec 28, 2020 at 23:21

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Collectives™ on Stack Overflow

How to get Index of first Row with none-zero minimum value in Pandas DataFrame?

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related