1

There are unreasonably high values and also negative values inside the 'Net Entries' and 'Net Exits' columns. I am trying to fix it with the code above. But I am keep encountering the below error. Below is my code:

indexes = [*D.index.unique()]
list_ = []

for index in indexes :
    
    df = D[D.index == index]
    
    array_ent = np.array(df['Net Entries'])
    array_ext = np.array(df['Net Exits'])
    
    avg_ent = np.mean(array_ent[(array_ent > 0) & (array_ent < 5040)])
    avg_ext = np.mean(array_ext[(array_ext > 0) & (array_ext < 5040)])
    
    array_ent[(array_ent < 0) | (array_ent > 5040)] = avg_ent
    array_ext[(array_ext < 0) | (array_ext > 5040)] = avg_ext
    
    df['x'] = array_ent
    df['y'] = array_ext
    
    list_.append(df)
    
MTA = pd.concat(list_, axis = 0)  

D.head()

RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)

Can anyone solve this problem ?

3
  • Those are warnings, not errors. You seem to have some iterations where no values in array_ent or array_ext fulfill your conditions. Commented Dec 26, 2022 at 23:21
  • Side note - your loop looks suspicious. Are you trying to manually iterate over the multiindex instead of calling groupby? A typical example of transforming a dataframe with groupby: pandas.pydata.org/pandas-docs/stable/user_guide/…. Replacing with per-group mean can be done by using where pandas.pydata.org/docs/reference/api/…: lambda x: x.where((x < 0) | (x > 5040), x.mean()). One-liners with clip in answers here: stackoverflow.com/q/47187359. Commented Dec 27, 2022 at 3:46
  • Yes this usage of lambda function can be very helpful, however I am not trying to replace the values x < 0 or x > 5040 with x.mean(), I am trying to change that values with the mean of elements which are between 0 and 5040. Commented Dec 27, 2022 at 14:26

1 Answer 1

1

You are looking for the .clip() function.

df['Net Entries'] = df['Net Entries'].clip(0, 5040)
df['Net Exits']   = df['Net Exits'].clip(0, 5040)

Once clipped, process those features as you wish: median, mean, whatever.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.