Combine multiple Pandas dataframes with duplicate datetime index pairs

Question

I have three Pandas dataframes, indexed by datetime: df1, df2, and df3. Each has pairs of dates in the index. I would like to combine these three dataframes together, retaining any datetime index pairs that are unique, but combining any repeated pairs so that these date pairs aren't listed multiple times (not a simple concat). Here are samples of the dataframes:

In [1]: print df1
            CurTempMid      id
fldDate                       
1997-12-23         0.0  recent
1997-12-23        -2.0    hist
1997-12-27         9.0  recent
1997-12-27         7.0    hist     
1998-02-10         9.0  recent
1998-02-10         7.0    hist
...                ...     ... 
2001-01-04        27.0  recent
2001-01-04        26.0    hist
2001-03-16        12.0  recent
2001-03-16        11.0    hist
2001-04-06        23.0  recent
2001-04-06        22.0    hist

In [2]: print df2
            MaxTempMid      id
fldDate                       
1998-01-02        29.0  recent
1998-01-02        28.0    hist
1998-02-15        18.0  recent
1998-02-15        23.0    hist
1998-02-23        24.0  recent
1998-02-23        15.0    hist
...                ...     ... 
2001-01-01        16.0  recent
2001-01-01        22.0    hist
2001-01-04        30.0  recent
2001-01-04        37.0    hist
2001-02-16        14.0  recent
2001-02-16        11.0    hist

In [3]: print df3
            MinTempMid      id
fldDate                       
1997-12-23         0.0  recent
1997-12-23        -2.0    hist
1997-12-26        -3.0  recent
1997-12-26        -2.0    hist
1997-12-27        -1.0  recent
1997-12-27         0.0    hist
...                ...     ...
2001-02-18         9.0  recent
2001-02-18        36.0    hist
2001-03-11        18.0  recent
2001-03-11        38.0    hist
2001-03-12        13.0  recent
2001-03-12        16.0    hist

The desired result looks like this:

            CurTempMid MaxTempMid MinTempMid       id    
fldDate                       
1997-12-23         0.0        Nan        0.0   recent
1997-12-23        -2.0        NaN       -2.0     hist
1997-12-26         Nan        NaN       -3.0   recent
1997-12-26         NaN        NaN       -2.0     hist
1997-12-27         9.0        NaN       -1.0   recent
1997-12-27         7.0        NaN        0.0     hist 
...                ...        ...        ...      ...

Once combined, the 'id' column should be identical, so I only need to retain a single 'id' column.

John Karasinski · Accepted Answer · 2018-06-26 01:49:34Z

3

If you're sure that the id column is identical across time series then this solution should work for you. You can merge the three dataframes on their fldDate and id columns, then set the index back to fldDate.

m = (df1.reset_index()
        .merge(df2.reset_index(), on=['fldDate', 'id'], how='outer')
        .merge(df3.reset_index(), on=['fldDate', 'id'], how='outer')
        .sort_values('fldDate'))
m.set_index('fldDate', inplace=True)
print(m.head())
#             CurTempMid      id  MaxTempMid  MinTempMid
# fldDate
# 1997-12-23         0.0  recent         NaN         0.0
# 1997-12-23        -2.0    hist         NaN        -2.0
# 1997-12-26         NaN    hist         NaN        -2.0
# 1997-12-26         NaN  recent         NaN        -3.0
# 1997-12-27         9.0  recent         NaN        -1.0

answered Jun 26, 2018 at 1:49

John Karasinski

1,0069 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Combine multiple Pandas dataframes with duplicate datetime index pairs

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related