Python Loop with IF condition on pandas dataframe gives me incomplete result or KeyError

Question

Given a dataframe:

d = {'A': [2, 1, 4, 5, 7, 8, 7, 5], 'B': [5, 7, 7, 6, 10, 9, 12, 10]}
testdf = pd.DataFrame(data=d)


    A   B
0   2   5
1   1   7
2   4   7
3   5   6
4   7   10
5   8   9
6   7   3
7   5   2

I'm comparing both columns and I expect to append 'Inside' to array if A > A-1 AND B < B-1, otherwise append 'Broken'.

array = []

for i in range(1,len(testdf)):
   
    if testdf.A[i] > testdf.A[i-1]:
        
        if testdf.B[i] < testdf.B[i-1]:
        
            array.append('INSIDE')
        
        else:
            
            array.append('BROKEN')

The result is:

['BROKEN', 'INSIDE', 'BROKEN', 'INSIDE']

But I expect:

['BROKEN', 'BROKEN', 'INSIDE', 'BROKEN', 'INSIDE', 'BROKEN', 'BROKEN']

I tried different variations with the starting point of the loop

for i in range(len(testdf)-1):

but it causes only key errors

How to improve the code to get it running as expected?

yatu · Accepted Answer · 2020-07-10 08:32:15Z

2

For a pandas based approach, you can use diff:

m = df.diff()
m = (m.A>0)&(m.B<0)
df['new_col'] = np.where(m, 'INSIDE', 'BROKEN')

print(df)
   A   B new_col
0  2   5  BROKEN
1  1   7  BROKEN
2  4   7  BROKEN
3  5   6  INSIDE
4  7  10  BROKEN
5  8   9  INSIDE
6  7   3  BROKEN
7  5   2  BROKEN

answered Jul 10, 2020 at 8:32

yatu

88.7k12 gold badges93 silver badges148 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Mark T Over a year ago

How would you deal with three or more conditions?

yatu Over a year ago

Check np.select, essentially the same as np.where but for multiple conditions @mark

jezrael · Accepted Answer · 2020-07-10 08:32:24Z

1

For expected output need to append else statement:

array = []
for i in range(1,len(testdf)):
    if testdf.A[i] > testdf.A[i-1]:
        if testdf.B[i] < testdf.B[i-1]:
            array.append('INSIDE')
        else:
            array.append('BROKEN')
    else:
        array.append('BROKEN')

Non loop solution, there is also tested first value, so same length like original, if need same output is removed first value by indexing [1:]:

mask = testdf['A'].gt(testdf['A'].shift()) & testdf['B'].lt(testdf['B'].shift())


out = np.where(mask, 'INSIDE', 'BROKEN').tolist()
print (out)
['BROKEN', 'BROKEN', 'BROKEN', 'INSIDE', 'BROKEN', 'INSIDE', 'BROKEN', 'BROKEN']

out1 = np.where(mask, 'INSIDE', 'BROKEN')[1:].tolist()
print (out1)
['BROKEN', 'BROKEN', 'INSIDE', 'BROKEN', 'INSIDE', 'BROKEN', 'BROKEN']

edited Jul 10, 2020 at 8:32

answered Jul 10, 2020 at 8:26

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

2 Comments

Mark T Over a year ago

How would you deal with three or more conditions?

jezrael Over a year ago

@MarkT - You can chain it by & like for bitwise AND like mask = testdf['A'].gt(testdf['A'].shift()) & testdf['B'].lt(testdf['B'].shift()) & df['col'].gt(10) - added condition for compare greaterlike 10 in column col

Balaji Ambresh · Accepted Answer · 2020-07-10 08:48:55Z

1

Here you go:

import numpy as np
import pandas as pd

d = {'A': [2, 1, 4, 5, 7, 8, 7, 5], 'B': [5, 7, 7, 6, 10, 9, 12, 10]}
testdf = pd.DataFrame(data=d)

mask1 = testdf.A > testdf.A.shift()
mask2 = testdf.B < testdf.B.shift()

res = np.where(mask1 & mask2, 'INSIDE', 'BROKEN')[1:]
print(res)

Output:

['BROKEN' 'BROKEN' 'INSIDE' 'BROKEN' 'INSIDE' 'BROKEN' 'BROKEN']

edited Jul 10, 2020 at 8:48

answered Jul 10, 2020 at 8:39

Balaji Ambresh

5,0022 gold badges7 silver badges17 bronze badges

Comments

Arsh Kenia · Accepted Answer · 2020-07-10 09:01:05Z

0

You can put the whole dataframe into an array like this Inside will come only once as the 6th element in the B column is less than the 5th element

import pandas as pd

d = {'A': [2, 1, 4, 5, 7, 8, 7, 5], 'B': [5, 7, 7, 6, 10, 9, 12, 10]}
testdf = pd.DataFrame(data=d)

dataframearray = [[],[]]
array = []
for number in d['A']:
    dataframearray[0].append(number)

for number in d['B']:
    dataframearray[1].append(number)

x = 1
while x < len(dataframearray[0])-1:
    x += 1
    if dataframearray[0][x] > dataframearray[0][x-1] and dataframearray[1][x] > dataframearray[1][x-1]:
        array.append('INSIDE')

    else:
        array.append('BROKEN')

Hope this helps

answered Jul 10, 2020 at 9:01

Arsh Kenia

1532 silver badges7 bronze badges

Collectives™ on Stack Overflow

Python Loop with IF condition on pandas dataframe gives me incomplete result or KeyError

4 Answers 4

2 Comments

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related