0

I have the following 3 dataframes

df_hl

State   District    Level   Name    TRU No_HH   TOT_HL_P   TOT_HL_M
0   0   India   India   Total   449787  1773040 
0   0   India   India   Rural   192891  834692  
0   0   India   India   Urban   256896  938348  

df_sc

State   District    Level   Name    TRU   No_HH_Head_sc TOT_P_sc
0   0   India   India   Total   41694863    201378372
0   0   India   India   Rural   31803775    153850848   
0   0   India   India   Urban   9891088     47527524        

df_st

State   District    Level   Name    TRU   No_HH_Head_st   TOT_P_st   TOT_M_st
0   0   India   India   Total   21511528    104545716   52547215    
0   0   India   India   Rural   19302332    94083844    47263733    
0   0   India   India   Urban   2209196     10461872    5283482

These are the structures of the dataframes after I have preprocessed their names. There are more number of columns and rows in the original dataset.

The thing I want to do is combine all these dataframes based on the following columns : `State, District, Level, Name and TRU.

My result dataframe should be of the following form :

State   District    Level   Name    TRU No_HH   TOT_HL_P   TOT_HL_M   No_HH_Head_sc  TOT_P_sc  No_HH_Head_st   TOT_P_st   TOT_M_st 
0   0   India   India   Total   449787  1773040   41694863    201378372  21511528    104545716   52547215
0   0   India   India   Rural   192891  834692   31803775    153850848 19302332    94083844    47263733  
0   0   India   India   Urban   256896  938348  41694863    201378372   2209196     10461872    5283482

I tried to create index based on those columns, but don't know how to combine all columns. Any help would be really appreciated.

df_hl.set_index(['State', 'District', 'Level', 'Name', "TRU"], inplace=True)
df_sc.set_index(['State', 'District', 'Level', 'Name', "TRU"], inplace=True)
df_st.set_index(['State', 'District', 'Level', 'Name', "TRU"], inplace=True)

Thanks a lot in advance.

2 Answers 2

1

Seems like this what you are looking for:

df1.merge(df2,how='left').merge(df3,how='left')?

Sign up to request clarification or add additional context in comments.

2 Comments

Any reason why you merged the dataframes using how = left
That refers to the kind of join you want : left join vs inner join vs right join so on ...
0

What you probably want is an inner joint of the dataframes. Assumed you have pandas imported as import pandas as pd:

df_final = pd.merge(left=pd.merge(left=df_hl, right=df_sc, on=["State", "District", "Level", "Name", "TRU"]), right=df_st, on=["State", "District", "Level", "Name", "TRU"])

A more generic way for doing this with an arbitrary number of dataframes is described here:

pandas three-way joining multiple dataframes on columns

import functools
dfs = [df_hl, df_sc, df_st]
df_final = functools.reduce(lambda left, right: pd.merge(left=left, right=right, on=["State", "District", "Level", "Name", "TRU"]), dfs)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.