0

I have three Pandas dataframes, indexed by datetime: df1, df2, and df3. Each has pairs of dates in the index. I would like to combine these three dataframes together, retaining any datetime index pairs that are unique, but combining any repeated pairs so that these date pairs aren't listed multiple times (not a simple concat). Here are samples of the dataframes:

In [1]: print df1
            CurTempMid      id
fldDate                       
1997-12-23         0.0  recent
1997-12-23        -2.0    hist
1997-12-27         9.0  recent
1997-12-27         7.0    hist     
1998-02-10         9.0  recent
1998-02-10         7.0    hist
...                ...     ... 
2001-01-04        27.0  recent
2001-01-04        26.0    hist
2001-03-16        12.0  recent
2001-03-16        11.0    hist
2001-04-06        23.0  recent
2001-04-06        22.0    hist

In [2]: print df2
            MaxTempMid      id
fldDate                       
1998-01-02        29.0  recent
1998-01-02        28.0    hist
1998-02-15        18.0  recent
1998-02-15        23.0    hist
1998-02-23        24.0  recent
1998-02-23        15.0    hist
...                ...     ... 
2001-01-01        16.0  recent
2001-01-01        22.0    hist
2001-01-04        30.0  recent
2001-01-04        37.0    hist
2001-02-16        14.0  recent
2001-02-16        11.0    hist

In [3]: print df3
            MinTempMid      id
fldDate                       
1997-12-23         0.0  recent
1997-12-23        -2.0    hist
1997-12-26        -3.0  recent
1997-12-26        -2.0    hist
1997-12-27        -1.0  recent
1997-12-27         0.0    hist
...                ...     ...
2001-02-18         9.0  recent
2001-02-18        36.0    hist
2001-03-11        18.0  recent
2001-03-11        38.0    hist
2001-03-12        13.0  recent
2001-03-12        16.0    hist

The desired result looks like this:

            CurTempMid MaxTempMid MinTempMid       id    
fldDate                       
1997-12-23         0.0        Nan        0.0   recent
1997-12-23        -2.0        NaN       -2.0     hist
1997-12-26         Nan        NaN       -3.0   recent
1997-12-26         NaN        NaN       -2.0     hist
1997-12-27         9.0        NaN       -1.0   recent
1997-12-27         7.0        NaN        0.0     hist 
...                ...        ...        ...      ...

Once combined, the 'id' column should be identical, so I only need to retain a single 'id' column.

1 Answer 1

3

If you're sure that the id column is identical across time series then this solution should work for you. You can merge the three dataframes on their fldDate and id columns, then set the index back to fldDate.

m = (df1.reset_index()
        .merge(df2.reset_index(), on=['fldDate', 'id'], how='outer')
        .merge(df3.reset_index(), on=['fldDate', 'id'], how='outer')
        .sort_values('fldDate'))
m.set_index('fldDate', inplace=True)
print(m.head())
#             CurTempMid      id  MaxTempMid  MinTempMid
# fldDate
# 1997-12-23         0.0  recent         NaN         0.0
# 1997-12-23        -2.0    hist         NaN        -2.0
# 1997-12-26         NaN    hist         NaN        -2.0
# 1997-12-26         NaN  recent         NaN        -3.0
# 1997-12-27         9.0  recent         NaN        -1.0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.