1

I want to normalize my ambient temperature column (Ta).
Here is my code:

df['Ta'] = df['Ta'].apply(lambda v: (v - df['Ta'].min())) / (df['Ta'].max() - df['Ta'].min())

It works well. But, it is very slow. The file size is 20 MB with the shape of (300000, 8).

Is there any other faster solution to this?

3
  • 1
    Maybe you can cache df['Ta'].min() and df['Ta'].max() in variables, instead of recomputation every time the lambda is called? Commented Jul 15, 2019 at 22:12
  • actualy you have constants : df['Ta'], (df['Ta'].max() - df['Ta'].min()), df['Ta'].min() and do not modify the content of df['Ta'] , make another_array = df_ta.apply... Commented Jul 15, 2019 at 22:16
  • Thanks. I am not sure how to implement it. The following is correct for small data and works fast with the large data. But not sure whether it is a true solution. df['Ta'] = (df['Ta'] - df['Ta'].min()) / (df['Ta'].max() - df['Ta'].min()) Commented Jul 15, 2019 at 22:40

2 Answers 2

1

Since you are not take advantage of pandas , apply here is another layout of for loop , which will slow down the whole process

 import pandas as pd ; import numpy as np

 df['Ta']- df['Ta'].min() / np.ptp(df['Ta']))
Sign up to request clarification or add additional context in comments.

Comments

0

I am not sure if there would be a faster way:

mx = df['Ta'].max()
mn = df['Ta'].min()

df['Ta'] -=mn
df['Ta']/=(mx-mn) 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.