How to concatenate two dataframes based on matching dates?

Question

I want to concatenate two earthquake catalogs stored as pandas dataframes.

import pandas as pd

ISC = {'my_index': [0,2,3], 'date': ['2001-03-06', '2001-03-20', '2001-03-30'], 'magnitude': [4.7,4.7,4.9]}
df1 = pd.DataFrame(data=ISC).set_index('my_index')


USGS = {'my_index': [1,4],'date': ['2001-03-20', '2001-03-30'], 'magnitude': [4.8,5]}
df2 = pd.DataFrame(data=USGS).set_index('my_index')

Here is catalog 1 (df1):

my_index        date  magnitude                 
0         2001-03-06        4.7
2         2001-03-20        4.7
3         2001-03-30        4.9

And catalog 2 (df2):

my_index        date  magnitude                 
1         2001-03-20        4.8
4         2001-03-30        5.0

When concatenating both dataframes (df3=pd.concat([df1,df2],axis=1,join='outer')), this is what I get:

my_index        date  magnitude        date  magnitude                                       
0         2001-03-06        4.7         NaN        NaN
1                NaN        NaN  2001-03-20        4.8
2         2001-03-20        4.7         NaN        NaN
3         2001-03-30        4.9         NaN        NaN
4                NaN        NaN  2001-03-30        5.0

However, after concatenation, I would like quakes happening on the same day to show up on the same line. This is my desired output:

index            date  magnitude        date  magnitude                                       
0         2001-03-06        4.7         NaN        NaN 
1         2001-03-20        4.7  2001-03-20        4.8
2         2001-03-30        4.9  2001-03-30        5.0

Any idea how can I achieve this result?

cs95 · Accepted Answer · 2019-02-20 22:57:22Z

2

If you don't need the extra date column, this is as simple as a single merge call.

(df1.merge(df2, on='date', how='left', suffixes=('', '_y'))
    .rename(lambda x: x.replace('_y', ''), axis=1))

         date  magnitude  magnitude
0  2001-03-06        4.7        NaN
1  2001-03-20        4.7        4.8
2  2001-03-30        4.9        5.0

To match your expected output, use set_index and join here:

u = (df1.set_index('date', drop=0)
        .join(df2.set_index('date', drop=0), how='left', lsuffix='', rsuffix='_y')
        .reset_index(drop=1))
u.columns = u.columns.str.replace('_y', '')
u

         date  magnitude        date  magnitude
0  2001-03-06        4.7         NaN        NaN
1  2001-03-20        4.7  2001-03-20        4.8
2  2001-03-30        4.9  2001-03-30        5.0

edited Feb 20, 2019 at 22:57

answered Feb 20, 2019 at 22:44

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

aysegulpekel · Accepted Answer · 2019-02-20 22:41:42Z

0

Seems like merge needed instead a concat:

df3 = pd.merge(df1, df2, on='date', how='outer')

answered Feb 20, 2019 at 22:41

aysegulpekel

3241 silver badge4 bronze badges

1 Comment

cs95 Over a year ago

This needs a lot of work to produce OP's expected output.

Collectives™ on Stack Overflow

How to concatenate two dataframes based on matching dates?

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related