1

I have the follow data point in panda dataframe:

DateTime                Data
2017-11-21 18:54:31     1
2017-11-22 02:26:48     2
2017-11-22 10:19:44     3
2017-11-22 15:11:28     6
2017-11-22 23:21:58     7
2017-11-28 14:28:28    28
2017-11-28 14:36:40     0
2017-11-28 14:59:48     1

I want to apply a function to convert all Data values bigger than 1 to 1: Is there a way to combine the following two lambda functions in one (like a else statement)?

[(lambda x: x/x)(x) for x in df['Data'] if x > 0]
[(lambda x: x)(x) for x in df['Data'] if x <1 ]

end result desired:

DateTime                Data
2017-11-21 18:54:31     1
2017-11-22 02:26:48     1
2017-11-22 10:19:44     1
2017-11-22 15:11:28     1
2017-11-22 23:21:58     1
2017-11-28 14:28:28     1
2017-11-28 14:36:40     0
2017-11-28 14:59:48     1

2 Answers 2

4

Numpy solution with np.clip -

df['Data'] = np.clip(df.Data.values, a_min=None, a_max=1)
df

              DateTime  Data
0  2017-11-21 18:54:31     1
1  2017-11-22 02:26:48     1
2  2017-11-22 10:19:44     1
3  2017-11-22 15:11:28     1
4  2017-11-22 23:21:58     1
5  2017-11-28 14:28:28     1
6  2017-11-28 14:36:40     0
7  2017-11-28 14:59:48     1

Pass a_min=None to specify no lower bound.

Sign up to request clarification or add additional context in comments.

3 Comments

This is a great answer and np is really powerful, however, i'll pick Jezrael as best answer since it uses datafram's internal function. I appreciate it nonetheless.
@user97662 not a problem. I can respect your decision. Happy coding.
@user97662 Although I should bring your attention to the fact that my answer is 9 times better than jezrael's fastest answer. Take a look at my timings. If performance is important, I encourage you to reconsider.
3

You can use clip_upper:

df['Data'] = df['Data'].clip_upper(1)

Or use ge (>=) for boolean mask and convert to int, if no negative values:

df['Data'] = df['Data'].ge(1).astype(int)

print (df)
              DateTime  Data
0  2017-11-21 18:54:31     1
1  2017-11-22 02:26:48     1
2  2017-11-22 10:19:44     1
3  2017-11-22 15:11:28     1
4  2017-11-22 23:21:58     1
5  2017-11-28 14:28:28     1
6  2017-11-28 14:36:40     0
7  2017-11-28 14:59:48     1

But if want use list comprehension (it should be slowier in bigger DataFrame):

df['Data'] = [1 if x > 0 else x for x in df['Data']]
print (df)
              DateTime  Data
0  2017-11-21 18:54:31     1
1  2017-11-22 02:26:48     1
2  2017-11-22 10:19:44     1
3  2017-11-22 15:11:28     1
4  2017-11-22 23:21:58     1
5  2017-11-28 14:28:28     1
6  2017-11-28 14:36:40     0
7  2017-11-28 14:59:48     1

Timings:

#[8000 rows x 5 columns]
df = pd.concat([df]*1000).reset_index(drop=True)

In [28]: %timeit df['Data2'] = df['Data'].clip_upper(1)
1000 loops, best of 3: 308 µs per loop

In [29]: %timeit df['Data3'] = df['Data'].ge(1).astype(int)
1000 loops, best of 3: 425 µs per loop

In [30]: %timeit df['Data1'] = [1 if x > 0 else x for x in df['Data']]
100 loops, best of 3: 3.02 ms per loop

#[800000 rows x 5 columns]
df = pd.concat([df]*100000).reset_index(drop=True)

In [32]: %timeit df['Data2'] = df['Data'].clip_upper(1)
100 loops, best of 3: 9.32 ms per loop

In [33]: %timeit df['Data3'] = df['Data'].ge(1).astype(int)
100 loops, best of 3: 4.76 ms per loop

In [34]: %timeit df['Data1'] = [1 if x > 0 else x for x in df['Data']]
1 loop, best of 3: 274 ms per loop

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.