0

A sample input data frame

import pandas as pd
df_input = pd.DataFrame([[1.7, 0.2], [0.4, 0.93], [0.05, 0.96], [0.97, 0.68]], columns=["A", "B"])

This example has two columns whereas the real dataframe has 10. I want to sort each row in ascending order and then assign -1 to first 5 columns and +1 to next 5 columns. Sample output is as follows:-

df_output=pd.DataFrame([[1, -1], [-1, 1], [-1, 1], [1, -1]], columns=["A", "B"])

Please suggest the way forward.

4 Answers 4

2

You want np.argsort:

np.argsort(df_input, axis=1).replace(0, -1)

   A  B
0  1 -1
1 -1  1
2 -1  1
3  1 -1

To generalise to N rows:

v = np.where(np.argsort(df_input) >= df.shape[1] // 2, 1, -1)    
df_output =  pd.DataFrame(v)

print(df)
    0   1   2   3   4   5   6   7   8   9
0  49  80  80  27  15  13  52  50  48  69
1  51  24  55  73  81  55  32  67  19  14
2  67   2  29  19  14  89  54  83  22  64
3  24  55  87  94  22  61  74  26  37   8

v = np.where(np.argsort(df_input) >= df.shape[1] // 2, 1, -1)    
df_output =  pd.DataFrame(v)

print(df_output)
   0  1  2  3  4  5  6  7  8  9
0  1 -1 -1  1 -1  1  1  1 -1 -1
1  1  1 -1  1 -1 -1  1  1 -1 -1
2 -1 -1 -1  1 -1  1  1 -1  1  1
3  1 -1 -1  1  1 -1  1  1 -1 -1
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. But when I expand this to dataframe of 9 columns I am getting -1 ,1,2,3,4,5,6,7 and 8. I need -1,-1,-1,-1,1,1,1,1,1
@AbhishekKulkarni Take another look please.
Yeah, did. Thanks
2

Use numpy.where with np.argsort:

np.random.seed(111)

df_input = pd.DataFrame(np.random.randint(10, size=(10, 10)), columns=list('abcdefghij'))
print (df_input)
   a  b  c  d  e  f  g  h  i  j
0  6  8  3  6  6  7  1  8  3  4
1  5  4  3  7  8  7  0  1  7  2
2  5  9  0  5  5  1  9  6  2  1
3  6  0  1  7  0  1  5  9  0  1
4  7  6  6  5  4  9  0  3  8  0
5  2  6  9  7  4  2  9  5  7  9
6  8  8  4  2  5  0  7  0  8  2
7  7  9  0  8  0  2  0  5  8  1
8  7  1  3  7  0  2  0  9  9  3
9  2  2  6  1  9  8  6  0  2  6

arr = np.where(np.argsort(df_input, axis=1) < 5 , -1, 1)

df_output = pd.DataFrame(arr, columns=df_input.columns)
print (df_output)
   a  b  c  d  e  f  g  h  i  j
0  1 -1  1  1 -1 -1 -1  1 -1  1
1  1  1  1 -1 -1 -1 -1  1  1 -1
2 -1  1  1  1 -1 -1 -1  1 -1  1
3 -1 -1  1 -1  1  1  1 -1 -1  1
4  1  1  1 -1 -1 -1 -1 -1  1  1
5 -1  1 -1  1 -1 -1  1 -1  1  1
6  1  1 -1  1 -1 -1  1 -1 -1  1
7 -1 -1  1  1  1  1 -1 -1  1 -1
8 -1  1 -1  1 -1  1 -1 -1  1  1
9  1 -1 -1 -1  1 -1  1  1  1 -1

1 Comment

@pyd - It is for always same random data, check this
1

You can rank, then assign conditionally via numpy.where:

df[:] = np.where(df.rank(axis=1) > df.shape[1] / 2, 1, -1)

print(df)

   A  B
0  1 -1
1 -1  1
2 -1  1
3  1 -1

Note: this assumes duplicate values always get same rank.

Comments

1
o = df_input.sort_values(by=list(df_input.columns), ascending=True, na_position='first')
o[list(df_input.columns)[:5]] = -1
o[list(df_input.columns)[6:]] = 1

1 Comment

Thanks for your time

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.