2

I need to merge two data frame with different rows and without common key:

df1:

name | age | loc

Bob | 20 | USA

df2:

food | car | sports

Sushi | Toyota | soccer

meat | Ford | baseball

result I want:

name | age | loc | food | car | sports

Bob | 20 | USA | Sushi | Toyota | soccer

Bob | 20 | USA | Meat | Ford | baseball

my code below:

pd.merge(df1,df2,how='right',left_index=True,right_index=True)

it works well when df2 is more than two rows but be incorrect when df2 is only one row.

any ideas for this question?

2 Answers 2

1

Use reindex_axis by index of df2:

df1 = df1.reindex_axis(df2.index, method='ffill')
print (df1)
  name  age  loc
0  Bob   20  USA
1  Bob   20  USA

df = pd.merge(df1,df2,how='right',left_index=True,right_index=True)
print (df)
  name  age  loc   food     car    sports
0  Bob   20  USA  Sushi  Toyota    soccer
1  Bob   20  USA   meat    Ford  baseball

You can use fillna with method ffill (.ffill) if no NaN data in df1 and df2:

#default outer join
df = pd.concat([df1,df2], axis=1).ffill()
print (df)
  name   age  loc   food     car    sports
0  Bob  20.0  USA  Sushi  Toyota    soccer
1  Bob  20.0  USA   meat    Ford  baseball

df = pd.merge(df1,df2,how='right',left_index=True,right_index=True).ffill()
print (df)
  name   age  loc   food     car    sports
0  Bob  20.0  USA  Sushi  Toyota    soccer
1  Bob  20.0  USA   meat    Ford  baseball
Sign up to request clarification or add additional context in comments.

6 Comments

Hi,@jezrael, Thank you for your help. your idea is great when row number of df2 is more than 1, but it does not work when df2 has only one row in it.
And solution df = pd.concat([df1,df2], axis=1).ffill() does not work?
it will result a new dataframe with two rows even my original df1 and df2 are both one row.
Hmmm, then what is logic? Because if use merge with left_index=True,right_index=True it means join by indexes - if in both is 1 row, then both indexes are 0 and output have one row with 0 index. Same works with concat. Maybe help docs.
but if merge with one df with one row (index is 0) and second with 2 rows (indexes are 0,1) then if use how='inner' (inner join) then get only first row, because match 0 index in both dataframes. But if use left or right join or outer join, get 1 or 2 rows, but second is full of NaNs, because no match.
|
1

Another type of solution... based on concat.

x = range(0,5)
y = range(5,10)
z = range(10,15)
a = range(10,5,-1)
b = range(15,10,-1)
v = range(0,1)
w = range(2,3)

A = pd.DataFrame(dict(x=x,y=y,z=z))
B = pd.DataFrame(dict(a=a,b=b))
C = pd.DataFrame(dict(v=v,w=w))

pd.concat([A,B])
>>> pd.concat([A,B],axis = 1)
   x  y   z   a   b
0  0  5  10  10  15
1  1  6  11   9  14
2  2  7  12   8  13
3  3  8  13   7  12
4  4  9  14   6  11

@Edit: based on the comments.. this solution does not answer the question.. Because in the question the amount of rows are different. Here is another solution This solution is based on the dataframe D

n_mult = B.shape[0]
D = C.append([C]*(n_mult-1)).reset_index()[['v','w']]
pd.concat([D,B],axis = 1)

2 Comments

Thank you, @zwep, your idea is great when df is more than one row, but does not work well when df2 has only one row.
Hi @zwep, Thank you for your idea. i have already resolved this issue with reset_index. thank you all the same.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.