Conditional Average from Pandas DataFrame

Question

I have a dataframe with multiple columns of real estate sales data. I would like to find the average price-per-square-foot 'ppsf' for all 1bed-1bath sales by zip code. Here is my attempt (each key in the dict is a zip code):

bed1_bath1={}
for zip in zip_codes:
    bed1_bath1[zip]= (df.loc[(df['bed']==1) & (df['bath']==1) & (df['zip']==zip)]).mean()

The problem is that this adds the mean of all columns from the dataframe to the dictionary. I'm sure there is a better way to do this; maybe using numpy.where?

zsomko · Accepted Answer · 2018-11-18 21:33:50Z

4

(df[(df['bed']==1) & (df['bath']==1) & (df['zip']==zip)])['ppsf'].mean() would do it. You simply choose the column you are interested in before calculating the mean (so you will not even do the processing for the rest of the columns).

answered Nov 18, 2018 at 21:33

zsomko

5812 silver badges6 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Conditional Average from Pandas DataFrame

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related