4

I have a dataframe column with some numeric values. I want that these values get replaced by 1 and 0 based on a given condition. The condition is that if the value is above the mean of the column, then change the numeric value to 1, else set it to 0.

Here is the code I have now:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

dataset = pd.read_csv('data.csv')
dataset = dataset.dropna(axis=0, how='any')

X = dataset.drop(['myCol'], axis=1)
y = dataset.iloc[:, 4:5].values

mean_y = np.mean(dataset.myCol)

The target is the dataframe y. y is like so:

      0
0    16
1    13
2    12.5
3    12

and so on. mean_y is equal to 3.55. Therefore, I need that all values greater than 3.55 to become ones, and the rest 0.

I applied this loop, but without success:

for i in dataset.myCol:
    if dataset.myCol[i] > mean_y:
        dataset.myCol[i] = 1
    else:
        dataset.myCol[i] = 0

The output is the following:

      0
0    16
1    13
2    0
3    12

What am I doing wrong? Can someone please explain me the mistake?

Thank you!

2 Answers 2

6

Try this vectorized approach:

dataset.myCol = np.where(dataset.myCol > dataset.myCol.mean(), 1, 0)
Sign up to request clarification or add additional context in comments.

3 Comments

This worked so well! Thanks. Can you please tell me why my approach didn't work?
you are indexing your dataset.myCol series by values instead of indices. i - will contain values and dataset.myCol[i] expects indices. NOTE: loops should be used as a last resort, when nothing else helps in Pandas
Thank you, Max. Got it now :)
2

Convert boolean mask to integer - Trues to 1 and Falses to 0:

print (dataset.myCol > mean_y)
0     True
1    False
2    False
3    False
Name: myCol, dtype: bool

dataset.myCol = (dataset.myCol > mean_y).astype(int)
print (dataset)
   myCol
0      1
1      0
2      0
3      0

For your aproach, not recommended because slow need iterrows for set values by columns and index values:

for i, x in dataset.iterrows():
    if dataset.loc[i, 'myCol'] > mean_y:
        dataset.loc[i, 'myCol'] = 1
    else:
        dataset.loc[i, 'myCol'] = 0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.