Assigning NumPy arrays into a DataFrame column inside for loops

Question

There are unreasonably high values and also negative values inside the 'Net Entries' and 'Net Exits' columns. I am trying to fix it with the code above. But I am keep encountering the below error. Below is my code:

indexes = [*D.index.unique()]
list_ = []

for index in indexes :
    
    df = D[D.index == index]
    
    array_ent = np.array(df['Net Entries'])
    array_ext = np.array(df['Net Exits'])
    
    avg_ent = np.mean(array_ent[(array_ent > 0) & (array_ent < 5040)])
    avg_ext = np.mean(array_ext[(array_ext > 0) & (array_ext < 5040)])
    
    array_ent[(array_ent < 0) | (array_ent > 5040)] = avg_ent
    array_ext[(array_ext < 0) | (array_ext > 5040)] = avg_ext
    
    df['x'] = array_ent
    df['y'] = array_ext
    
    list_.append(df)
    
MTA = pd.concat(list_, axis = 0)

D.head()

RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,

RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)

Can anyone solve this problem ?

Those are warnings, not errors. You seem to have some iterations where no values in array_ent or array_ext fulfill your conditions. — pho
– pho, Commented Dec 26, 2022 at 23:21
Side note - your loop looks suspicious. Are you trying to manually iterate over the multiindex instead of calling groupby? A typical example of transforming a dataframe with groupby: pandas.pydata.org/pandas-docs/stable/user_guide/…. Replacing with per-group mean can be done by using where pandas.pydata.org/docs/reference/api/…: lambda x: x.where((x < 0) | (x > 5040), x.mean()). One-liners with clip in answers here: stackoverflow.com/q/47187359. — Lodinn
– Lodinn, Commented Dec 27, 2022 at 3:46
Yes this usage of lambda function can be very helpful, however I am not trying to replace the values x < 0 or x > 5040 with x.mean(), I am trying to change that values with the mean of elements which are between 0 and 5040. — OnurYukay
– OnurYukay, Commented Dec 27, 2022 at 14:26

J_H · Accepted Answer · 2022-12-26 23:30:22Z

1

You are looking for the .clip() function.

df['Net Entries'] = df['Net Entries'].clip(0, 5040)
df['Net Exits']   = df['Net Exits'].clip(0, 5040)

Once clipped, process those features as you wish: median, mean, whatever.

answered Dec 26, 2022 at 23:30

J_H

21.3k5 gold badges29 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Assigning NumPy arrays into a DataFrame column inside for loops

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related