Set maximum value in DataFrame column

Question

I have the follow data point in panda dataframe:

DateTime                Data
2017-11-21 18:54:31     1
2017-11-22 02:26:48     2
2017-11-22 10:19:44     3
2017-11-22 15:11:28     6
2017-11-22 23:21:58     7
2017-11-28 14:28:28    28
2017-11-28 14:36:40     0
2017-11-28 14:59:48     1

I want to apply a function to convert all Data values bigger than 1 to 1: Is there a way to combine the following two lambda functions in one (like a else statement)?

[(lambda x: x/x)(x) for x in df['Data'] if x > 0]
[(lambda x: x)(x) for x in df['Data'] if x <1 ]

end result desired:

DateTime                Data
2017-11-21 18:54:31     1
2017-11-22 02:26:48     1
2017-11-22 10:19:44     1
2017-11-22 15:11:28     1
2017-11-22 23:21:58     1
2017-11-28 14:28:28     1
2017-11-28 14:36:40     0
2017-11-28 14:59:48     1

cs95 · Accepted Answer · 2019-01-13 02:49:56Z

4

Numpy solution with np.clip -

df['Data'] = np.clip(df.Data.values, a_min=None, a_max=1)
df

              DateTime  Data
0  2017-11-21 18:54:31     1
1  2017-11-22 02:26:48     1
2  2017-11-22 10:19:44     1
3  2017-11-22 15:11:28     1
4  2017-11-22 23:21:58     1
5  2017-11-28 14:28:28     1
6  2017-11-28 14:36:40     0
7  2017-11-28 14:59:48     1

Pass a_min=None to specify no lower bound.

edited Jan 13, 2019 at 2:49

answered Dec 19, 2017 at 6:38

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user97662 Over a year ago

This is a great answer and np is really powerful, however, i'll pick Jezrael as best answer since it uses datafram's internal function. I appreciate it nonetheless.

cs95 Over a year ago

@user97662 not a problem. I can respect your decision. Happy coding.

cs95 Over a year ago

@user97662 Although I should bring your attention to the fact that my answer is 9 times better than jezrael's fastest answer. Take a look at my timings. If performance is important, I encourage you to reconsider.

jezrael · Accepted Answer · 2017-12-19 06:35:27Z

You can use clip_upper:

df['Data'] = df['Data'].clip_upper(1)

Or use ge (>=) for boolean mask and convert to int, if no negative values:

df['Data'] = df['Data'].ge(1).astype(int)

print (df)
              DateTime  Data
0  2017-11-21 18:54:31     1
1  2017-11-22 02:26:48     1
2  2017-11-22 10:19:44     1
3  2017-11-22 15:11:28     1
4  2017-11-22 23:21:58     1
5  2017-11-28 14:28:28     1
6  2017-11-28 14:36:40     0
7  2017-11-28 14:59:48     1

But if want use list comprehension (it should be slowier in bigger DataFrame):

df['Data'] = [1 if x > 0 else x for x in df['Data']]
print (df)
              DateTime  Data
0  2017-11-21 18:54:31     1
1  2017-11-22 02:26:48     1
2  2017-11-22 10:19:44     1
3  2017-11-22 15:11:28     1
4  2017-11-22 23:21:58     1
5  2017-11-28 14:28:28     1
6  2017-11-28 14:36:40     0
7  2017-11-28 14:59:48     1

Timings:

#[8000 rows x 5 columns]
df = pd.concat([df]*1000).reset_index(drop=True)

In [28]: %timeit df['Data2'] = df['Data'].clip_upper(1)
1000 loops, best of 3: 308 µs per loop

In [29]: %timeit df['Data3'] = df['Data'].ge(1).astype(int)
1000 loops, best of 3: 425 µs per loop

In [30]: %timeit df['Data1'] = [1 if x > 0 else x for x in df['Data']]
100 loops, best of 3: 3.02 ms per loop

#[800000 rows x 5 columns]
df = pd.concat([df]*100000).reset_index(drop=True)

In [32]: %timeit df['Data2'] = df['Data'].clip_upper(1)
100 loops, best of 3: 9.32 ms per loop

In [33]: %timeit df['Data3'] = df['Data'].ge(1).astype(int)
100 loops, best of 3: 4.76 ms per loop

In [34]: %timeit df['Data1'] = [1 if x > 0 else x for x in df['Data']]
1 loop, best of 3: 274 ms per loop

Collectives™ on Stack Overflow

Set maximum value in DataFrame column

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related