3

I have a column in my dataframe comprised of numbers. Id like to have another column in the dataframe that takes a running average of the values greater than 0 that i can ideally do in numpy without iteration. (data is huge)

Vals    Output
-350    
1000    1000
1300    1150
1600    1300
1100    1250
1000    1200
450     1075
1900    1192.857143
-2000   1192.857143
-3150   1192.857143
1000    1168.75
-900    1168.75
800     1127.777778
8550    1870

Code:

list =[-350,    1000,   1300,   1600,   1100,   1000,   450,
    1900,   -2000,  -3150,  1000,   -900,   800,    8550]
    df = pd.DataFrame(data = list)
1
  • 2
    Side note, don't shadow built-ins, use L instead of list. Commented Aug 6, 2018 at 17:22

1 Answer 1

5

Option 1
expanding and mean

df.assign(out=df.loc[df.Vals.gt(0)].Vals.expanding().mean()).ffill()

If you have other columns in your DataFrame that have NaN values, this method will ffill those too, so if that is a concern, you may want to consider using something like this:

df['Out'] = df.loc[df.Vals.gt(0)].Vals.expanding().mean()
df['Out'] = df.Out.ffill()

Which will only fill in the Out column.

Option 2
mask:

df.assign(Out=df.mask(df.Vals.lt(0)).Vals.expanding().mean())

Both of these result in:

    Vals          Out
0   -350          NaN
1   1000  1000.000000
2   1300  1150.000000
3   1600  1300.000000
4   1100  1250.000000
5   1000  1200.000000
6    450  1075.000000
7   1900  1192.857143
8  -2000  1192.857143
9  -3150  1192.857143
10  1000  1168.750000
11  -900  1168.750000
12   800  1127.777778
13  8550  1870.000000
Sign up to request clarification or add additional context in comments.

1 Comment

@novawaly I personally think my second solution (the mask) is more straightforward, so I would recommend using that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.