pandas concat/merge/join multiple dataframes with only one column by this column

Question

I have (more than) two dataframes:

In [22]: df = pd.DataFrame({'database' : ['db1', 'db2', 'db3']})                                                                                                                                                             

In [23]: df1 = pd.DataFrame({'database' : ['db1', 'db2', 'db3']})                                                                                                                                                            

In [24]: df2 = pd.DataFrame({'database' : ['db2', 'db3', 'db4']})                                                                                                                                                            

In [25]: df1                                                                                                                                                                                                                 
Out[25]: 
  database
0      db1
1      db2
2      db3

In [26]: df2                                                                                                                                                                                                                 
Out[26]: 
  database
0      db2
1      db3
2      db4

What I want as output is dataframe in this format:

Out[45]: 
  database database
0      db1         
1      db2      db2
2      db3      db3
3               db4

I manage to get it in this format like this:

df1.index = df1.database.values.ravel()
df2.index = df2.database.values.ravel()
pd.concat([df1, df2], axis=1).fillna('').reset_index(drop=True)

But I think there must be better solution than this trick with ravel() function.

jezrael · Accepted Answer · 2019-03-16 12:13:54Z

2

Use DataFrame.set_index with drop=False:

df = (pd.concat([df1.set_index('database', drop=False), 
                 df2.set_index('database', drop=False)], axis=1)
        .fillna('')
        .reset_index(drop=True))
print (df)
  database database
0      db1         
1      db2      db2
2      db3      db3
3               db4

More dynamic solution with list comprehension:

dfs = [df, df1, df2]
dfs1 = [x.set_index('database', drop=False) for x in dfs]
df = (pd.concat(dfs1, axis=1)
        .fillna('')
        .reset_index(drop=True))
print (df)
  database database database
0      db1      db1         
1      db2      db2      db2
2      db3      db3      db3
3                        db4

answered Mar 16, 2019 at 12:13

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Kaushal28 · Accepted Answer · 2019-03-16 12:42:20Z

0

You can create a series and append it as a row to your data frame and then shift the 2nd column by 1. Here is an example:

df = pd.concat([df1, df2], axis = 1)
import numpy as np
s = pd.Series([np.NaN, np,NaN], index = ['database', 'database1'])
df.append(s, ignore_index = True)
df['database1'] = df['database1'].shift(1)
df.fillna('')

This will generate expected output. Hope this helps!

answered Mar 16, 2019 at 12:42

Kaushal28

5,5837 gold badges47 silver badges79 bronze badges

Collectives™ on Stack Overflow

pandas concat/merge/join multiple dataframes with only one column by this column

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related