1

Given a dataframe:

d = {'A': [2, 1, 4, 5, 7, 8, 7, 5], 'B': [5, 7, 7, 6, 10, 9, 12, 10]}
testdf = pd.DataFrame(data=d)


    A   B
0   2   5
1   1   7
2   4   7
3   5   6
4   7   10
5   8   9
6   7   3
7   5   2

I'm comparing both columns and I expect to append 'Inside' to array if A > A-1 AND B < B-1, otherwise append 'Broken'.

array = []

for i in range(1,len(testdf)):
   
    if testdf.A[i] > testdf.A[i-1]:
        
        if testdf.B[i] < testdf.B[i-1]:
        
            array.append('INSIDE')
        
        else:
            
            array.append('BROKEN')

The result is:

['BROKEN', 'INSIDE', 'BROKEN', 'INSIDE']

But I expect:

['BROKEN', 'BROKEN', 'INSIDE', 'BROKEN', 'INSIDE', 'BROKEN', 'BROKEN']

I tried different variations with the starting point of the loop

for i in range(len(testdf)-1):

but it causes only key errors

How to improve the code to get it running as expected?

4 Answers 4

2

For a pandas based approach, you can use diff:

m = df.diff()
m = (m.A>0)&(m.B<0)
df['new_col'] = np.where(m, 'INSIDE', 'BROKEN')

print(df)
   A   B new_col
0  2   5  BROKEN
1  1   7  BROKEN
2  4   7  BROKEN
3  5   6  INSIDE
4  7  10  BROKEN
5  8   9  INSIDE
6  7   3  BROKEN
7  5   2  BROKEN
Sign up to request clarification or add additional context in comments.

2 Comments

How would you deal with three or more conditions?
Check np.select, essentially the same as np.where but for multiple conditions @mark
1

For expected output need to append else statement:

array = []
for i in range(1,len(testdf)):
    if testdf.A[i] > testdf.A[i-1]:
        if testdf.B[i] < testdf.B[i-1]:
            array.append('INSIDE')
        else:
            array.append('BROKEN')
    else:
        array.append('BROKEN')

Non loop solution, there is also tested first value, so same length like original, if need same output is removed first value by indexing [1:]:

mask = testdf['A'].gt(testdf['A'].shift()) & testdf['B'].lt(testdf['B'].shift())


out = np.where(mask, 'INSIDE', 'BROKEN').tolist()
print (out)
['BROKEN', 'BROKEN', 'BROKEN', 'INSIDE', 'BROKEN', 'INSIDE', 'BROKEN', 'BROKEN']

out1 = np.where(mask, 'INSIDE', 'BROKEN')[1:].tolist()
print (out1)
['BROKEN', 'BROKEN', 'INSIDE', 'BROKEN', 'INSIDE', 'BROKEN', 'BROKEN']

2 Comments

How would you deal with three or more conditions?
@MarkT - You can chain it by & like for bitwise AND like mask = testdf['A'].gt(testdf['A'].shift()) & testdf['B'].lt(testdf['B'].shift()) & df['col'].gt(10) - added condition for compare greaterlike 10 in column col
1

Here you go:

import numpy as np
import pandas as pd

d = {'A': [2, 1, 4, 5, 7, 8, 7, 5], 'B': [5, 7, 7, 6, 10, 9, 12, 10]}
testdf = pd.DataFrame(data=d)

mask1 = testdf.A > testdf.A.shift()
mask2 = testdf.B < testdf.B.shift()

res = np.where(mask1 & mask2, 'INSIDE', 'BROKEN')[1:]
print(res)

Output:

['BROKEN' 'BROKEN' 'INSIDE' 'BROKEN' 'INSIDE' 'BROKEN' 'BROKEN']

Comments

0

You can put the whole dataframe into an array like this Inside will come only once as the 6th element in the B column is less than the 5th element

import pandas as pd

d = {'A': [2, 1, 4, 5, 7, 8, 7, 5], 'B': [5, 7, 7, 6, 10, 9, 12, 10]}
testdf = pd.DataFrame(data=d)

dataframearray = [[],[]]
array = []
for number in d['A']:
    dataframearray[0].append(number)

for number in d['B']:
    dataframearray[1].append(number)

x = 1
while x < len(dataframearray[0])-1:
    x += 1
    if dataframearray[0][x] > dataframearray[0][x-1] and dataframearray[1][x] > dataframearray[1][x-1]:
        array.append('INSIDE')

    else:
        array.append('BROKEN')

Hope this helps

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.