Find average of a column in a dataframe given conditions on another column

Question

Say i have the dataframe above, and I wish to write a function

    def ave(pd,minx,maxx):

which calculates the average of the y values for respective x values between minx and maxx, ie in the following example:

    ave(file, 2, 3) #where file is wherever I import these x and y values from

it would return 3.3857...

I have tried the following:

def ave(pd,minx,maxx):
x = list(data.iloc[:, 0].values)
y = list(data.iloc[:, 1].values)
lst=[]
for i in x:
    if x[i]>xmin and x[i]<xmax:
        lst+=y[i]
return (sum(lst)/len(list))

but this gives the error: list indices must be integers or slices, not numpy.float64

ddejohn · Accepted Answer · 2022-04-22 15:28:56Z

2

Why not just select rows where those conditions are true? You really should avoid looping as much as possible when working with dataframes.

def y_average(df, min_x, max_x):
    return df[(df["x"] > min_x) & (df["x"] < max_x)]["y"].mean()

Usage:

In [3]: avg(df, 2, 3)
Out[3]: 3.3857142857142857

answered Apr 22, 2022 at 15:28

ddejohn

9,0043 gold badges21 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

pho Over a year ago

@bizzey because df is supposed to be a dataframe. I don't think it makes sense for you to pass the file and then do this. Read the file separately. Then have your function calculate the mean.

Collectives™ on Stack Overflow

Find average of a column in a dataframe given conditions on another column

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related