3

I have been working on just this the whole day but couldn't come up with better solution. I have list of dataframes

ls=[df1,df2,df3,df4] and tried several methods to merge.

df= reduce(lambda left,right: pd.merge(left,right,on='Time',how='outer'), ls)

Most of the examples I read just drop the col_y columns and change the col_x columns name. How can I achieve the desired dataframe, thanks in advance.

df1

Time col1 col2
 1  a   11
 2  b   12
 3  c   13
 4  d   14

df2

Time col3 col1
1   11    na
2   15    na
3   66    na
4   78    na
5   33    f
6   22    g

Mergd df3

 Time   col1_x  col2    col3    col1_y
 1      a      11     11           na
 2      b      12     15           na
 3      c      13     66           na
 4      d      14     78           na
 5     na      na     33            f
 6     na      na     22            g

What I wanted the final df to be:

Desired merged df

 Time   col1    col2    col3    
 1      a      11     11           
 2      b      12     15           
 3      c      13     66           
 4      d      14     78           
 5      f      na     33            
 6      g      na     22            

2 Answers 2

4

Use combine_first with drop as, if you have more than 2 dataframes do this operation after merging and cascade the combine_first with df3 and df4:

df = df1.merge(df2, how='outer', on=['Time'])
df['col1'] = df['col1_x'].combine_first(df['col1_y'])
df = df.drop(['col1_x', 'col1_y'], axis=1)
#or alternative solution
df1 = df1.set_index('time')
df2 = df2.set_index('time')
df = df1.combine_first(df2)

print(df)
   Time  col2  col3 col1
0     1  11.0    11    a
1     2  12.0    15    b
2     3  13.0    66    c
3     4  14.0    78    d
4     5   NaN    33    f
5     6   NaN    22    g

You can use combine_first of dataframe operation after setting time as index like:

df1 = df1.set_index('time')
df2 = df2.set_index('time')
df3 = df3.set_index('time')
df4 = df4.set_index('time')
df = df1.combine_first(df2).combine_first(df3).combine_first(df4)

Using reduce function:

df = reduce(lambda left,right: left.set_index('time').combine_first(right.set_index('time')), 
            ls)
Sign up to request clarification or add additional context in comments.

Comments

2

You can take advantage the fact when dataframes share an index, assignments and replacements align on the index before operating.

df1 = pd.DataFrame.from_records(
    [
        {"Time": 1, "col1": "a", "col2": 11},
        {"Time": 2, "col1": "b", "col2": 12},
        {"Time": 3, "col1": "c", "col2": 13},
        {"Time": 4, "col1": "d", "col2": 14},
    ]
).set_index("Time")

df2 = pd.DataFrame.from_records(
    [
        {"Time": 1, "col3": 11, "col1": np.nan},
        {"Time": 2, "col3": 15, "col1": np.nan},
        {"Time": 3, "col3": 66, "col1": np.nan},
        {"Time": 4, "col3": 78, "col1": np.nan},
        {"Time": 5, "col3": 33, "col1": "f"},
        {"Time": 6, "col3": 22, "col1": "g"},
    ]
).set_index("Time")

dfs = [df1, df2]

index = pd.Index(dfs[0].index)
for this_df in dfs[1:]:
    index = index.union(this_df.index)

df = pd.DataFrame(index=index)

for this_df in dfs:
    for col in this_df.columns:
        if col not in df.columns:
            df[col] = this_df[col]
        else:
            df[col] = df[col].fillna(this_df[col])

print(df)

     col1  col2  col3
Time                 
1       a  11.0    11
2       b  12.0    15
3       c  13.0    66
4       d  14.0    78
5       f   NaN    33
6       g   NaN    22

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.