1

I want to concatenate two earthquake catalogs stored as pandas dataframes.

import pandas as pd

ISC = {'my_index': [0,2,3], 'date': ['2001-03-06', '2001-03-20', '2001-03-30'], 'magnitude': [4.7,4.7,4.9]}
df1 = pd.DataFrame(data=ISC).set_index('my_index')


USGS = {'my_index': [1,4],'date': ['2001-03-20', '2001-03-30'], 'magnitude': [4.8,5]}
df2 = pd.DataFrame(data=USGS).set_index('my_index')

Here is catalog 1 (df1):

my_index        date  magnitude                 
0         2001-03-06        4.7
2         2001-03-20        4.7
3         2001-03-30        4.9

And catalog 2 (df2):

my_index        date  magnitude                 
1         2001-03-20        4.8
4         2001-03-30        5.0

When concatenating both dataframes (df3=pd.concat([df1,df2],axis=1,join='outer')), this is what I get:

my_index        date  magnitude        date  magnitude                                       
0         2001-03-06        4.7         NaN        NaN
1                NaN        NaN  2001-03-20        4.8
2         2001-03-20        4.7         NaN        NaN
3         2001-03-30        4.9         NaN        NaN
4                NaN        NaN  2001-03-30        5.0

However, after concatenation, I would like quakes happening on the same day to show up on the same line. This is my desired output:

index            date  magnitude        date  magnitude                                       
0         2001-03-06        4.7         NaN        NaN 
1         2001-03-20        4.7  2001-03-20        4.8
2         2001-03-30        4.9  2001-03-30        5.0

Any idea how can I achieve this result?

2 Answers 2

2

If you don't need the extra date column, this is as simple as a single merge call.

(df1.merge(df2, on='date', how='left', suffixes=('', '_y'))
    .rename(lambda x: x.replace('_y', ''), axis=1))

         date  magnitude  magnitude
0  2001-03-06        4.7        NaN
1  2001-03-20        4.7        4.8
2  2001-03-30        4.9        5.0

To match your expected output, use set_index and join here:

u = (df1.set_index('date', drop=0)
        .join(df2.set_index('date', drop=0), how='left', lsuffix='', rsuffix='_y')
        .reset_index(drop=1))
u.columns = u.columns.str.replace('_y', '')
u

         date  magnitude        date  magnitude
0  2001-03-06        4.7         NaN        NaN
1  2001-03-20        4.7  2001-03-20        4.8
2  2001-03-30        4.9  2001-03-30        5.0
Sign up to request clarification or add additional context in comments.

Comments

0

Seems like merge needed instead a concat:

df3 = pd.merge(df1, df2, on='date', how='outer')

1 Comment

This needs a lot of work to produce OP's expected output.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.