0
     x      y
    1.2    3.1
    1.4    3.5
    1.5    3.2
    2.2    3.6
    2.2    2.8
    2.3    3.3
    2.4    3.5
    2.5    3.8
    2.7    3.4
    2.8    3.3

Say i have the dataframe above, and I wish to write a function

    def ave(pd,minx,maxx):

which calculates the average of the y values for respective x values between minx and maxx, ie in the following example:

    ave(file, 2, 3) #where file is wherever I import these x and y values from

it would return 3.3857...

I have tried the following:

def ave(pd,minx,maxx):
x = list(data.iloc[:, 0].values)
y = list(data.iloc[:, 1].values)
lst=[]
for i in x:
    if x[i]>xmin and x[i]<xmax:
        lst+=y[i]
return (sum(lst)/len(list))

but this gives the error: list indices must be integers or slices, not numpy.float64

1 Answer 1

2

Why not just select rows where those conditions are true? You really should avoid looping as much as possible when working with dataframes.

def y_average(df, min_x, max_x):
    return df[(df["x"] > min_x) & (df["x"] < max_x)]["y"].mean()

Usage:

In [3]: avg(df, 2, 3)
Out[3]: 3.3857142857142857
Sign up to request clarification or add additional context in comments.

1 Comment

@bizzey because df is supposed to be a dataframe. I don't think it makes sense for you to pass the file and then do this. Read the file separately. Then have your function calculate the mean.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.