1

I have a dataset of house sales with timestamped Periods(per quarter). I want to adjust the price according to the house pricing index change per region. I have a separate dataframe with 3 columns, the Quarter, the Region and the % change in price. I am currently achieving this by iterating over both dataframes. Is there a better way?

Minimal Example;

import pandas as pd
houses_df = pd.DataFrame({'HousePrice' : [100000, 250000, 125000, 320000],
                          'Period' : ['2020Q1', '2020Q2', '2020Q1', '2020Q3'],
                          'Region' : ['NY-West', 'NY-East', 'NY-West', 'NY-East']})

HPindex_df = pd.DataFrame({'Periods' : ['2020Q1', '2020Q2', '2020Q3'] * 2,
                           'Regions' : ['NY-West', 'NY-West', 'NY-West', 'NY-East', 'NY-East', 'NY-East'],
                          'PriceIndex' : [1.1, 1.13, 0.87, 1.35, 1.21, 1.11]})
for index, row in houses_df.iterrows():
    for row1 in HPindex_df.itertuples():
        a = row1
        if row['Region'] == row1.Regions and row['Period'] == row1.Periods:
            houses_df.loc[index, 'HousePrice'] = houses_df.loc[index, 'HousePrice'] * row1.PriceIndex

print(houses_df)
   HousePrice  Period   Region
0    110000.0  2020Q1  NY-West
1    302500.0  2020Q2  NY-East
2    137500.0  2020Q1  NY-West
3    355200.0  2020Q3  NY-East

print(HPindex_df)
  Periods  Regions  PriceIndex
0  2020Q1  NY-West        1.10
1  2020Q2  NY-West        1.13
2  2020Q3  NY-West        0.87
3  2020Q1  NY-East        1.35
4  2020Q2  NY-East        1.21
5  2020Q3  NY-East        1.11

I don't think this is the best way to do this, since this is a naive implementation. Is there a (vectorized/pandas built-in functions)-way to achieve this?

1 Answer 1

2

Use DataFrame.merge with left_on and right_on, then get all 4 column in output:

df = houses_df.merge(HPindex_df, 
                     left_on=['Period','Region'], 
                     right_on=['Periods','Regions'], 
                     how='left')
df['HousePrice'] = df['HousePrice'] * df['PriceIndex']
print (df)

   HousePrice  Period   Region Periods  Regions  PriceIndex
0    110000.0  2020Q1  NY-West  2020Q1  NY-West        1.10
1    302500.0  2020Q2  NY-East  2020Q2  NY-East        1.21
2    137500.0  2020Q1  NY-West  2020Q1  NY-West        1.10
3    355200.0  2020Q3  NY-East  2020Q3  NY-East        1.11

For avoid it is possible use rename:

d = {'Periods':'Period','Regions':'Region'}
df = houses_df.merge(HPindex_df.rename(columns=d), on=['Period','Region'], how='left')
df['HousePrice'] = df['HousePrice'] * df['PriceIndex']
print (df)

   HousePrice  Period   Region  PriceIndex
0    110000.0  2020Q1  NY-West        1.10
1    302500.0  2020Q2  NY-East        1.21
2    137500.0  2020Q1  NY-West        1.10
3    355200.0  2020Q3  NY-East        1.11

Or DataFrame.join with DataFrame.set_index:

df = houses_df.join(HPindex_df.set_index(['Periods','Regions']), on=['Period','Region'])
df['HousePrice'] = df['HousePrice'] * df['PriceIndex']
print (df)
   HousePrice  Period   Region  PriceIndex
0    110000.0  2020Q1  NY-West        1.10
1    302500.0  2020Q2  NY-East        1.21
2    137500.0  2020Q1  NY-West        1.10
3    355200.0  2020Q3  NY-East        1.11
Sign up to request clarification or add additional context in comments.

3 Comments

gosh darn, jezrael, give the rest of us a chance at these pandas questions XD
@greg_data - I think there is a lot of unanswered ;)
That was... fast!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.