1

I have following dataframe:

Index ColA ColB ColC ColD 
0       1    4   13   ABC
1       12   1   24   ABC
2       36   18  1    ABC
3       41   45  1    ABC

Now I'm searching for a simple command to transform the pandas df in such a way that the values of ColA, ColB, and ColC are resembled as follows:

for each row:
   if value in ColA <= 12 then 1
   if value in ColA > 12 and <= 24 then 2
   if value in ColA > 24 and <= 36 then 3
   if value in ColA > 36 then 4

(the same also for the other columns)

So the result would look like this:

Index ColA ColB ColC ColD 
0       1    1   2    ABC
1       1    1   2    ABC
2       3    2   1    ABC
3       4    4   1    ABC

Is there a simple way to achieve this? :-)

Best regards, André

2 Answers 2

2

You can use the functions provided by pandas to solve this problem.

Basically, you can iterate over all the columns and change all the values of a column that lie in a range to the new value using the functions provided by the pandas dataframe.

import pandas as pd
import numpy as np

df = pd.DataFrame()

df["ColA"] = [1, 12, 32, 24]
df["ColB"] = [23, 11, 6, 45]
df["ColC"] = [10, 25, 3, 23]

print(df)

Output:

   ColA  ColB  ColC
0     1    23    10
1    12    11    25
2    32     6     3
3    24    33    23

Now, we will find all the indexes for a column that have values in the given range using the code df['ColA'].between(0,12) and assign new value for these indexes for this column, using the code df.loc[df['ColA'].between(0,12), 'ColA'] = 1.

This is done for ColA, now to do it for all columns of a dataframe we will use looping and this can be done using the following code.

for col in df.columns:
    df.loc[df[col].between(0,12), col] = 1
    df.loc[df[col].between(13,24), col] = 2
    df.loc[df[col].between(25,36), col] = 3

print(df)

Output:

   ColA  ColB  ColC
0     1     2     1
1     1     1     3
2     1     1     1
3     1     3     2
Sign up to request clarification or add additional context in comments.

1 Comment

Ah ok i didnt know the "between" function. Thank you!
0

General solution with numpy.select:

cols = ['ColA','ColB','ColC']
m1 = df[cols] <= 12
m2 = df[cols] <= 24
m3 = df[cols] <= 36

df[cols] = np.select([m1, m2, m3], [1,2,3], default=4)
print (df)
   ColA  ColB  ColC ColD
0     1     1     2  ABC
1     1     1     2  ABC
2     3     2     1  ABC
3     4     4     1  ABC

Another solution if alwyas need [1,2,3,4] values with your conditions:

Subtract 1 and use integer division of 12, last add 1, also added DataFrame.clip for set minimal and maximal values outside threshold:

cols = ['ColA','ColB','ColC']

df[cols] = (df[cols].clip(lower=1, upper=37) - 1) // 12 + 1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.