Python / Pandas - Merging on index with multiple repeated keys

Question

I have this dataframe:

df1:
                year     revenues  
index                                                                    
03374312000153  2010        25432 
03374312000153  2009        25433 
48300560000198  2014        13894  
48300560000198  2013        18533 
48300560000198  2012        18534
NaN             NaN         NaN 
...

And I have this other dataframe:

df2:
                Name         Street  
index                                                                    
03374312000153  Yeap Co     Locc St 
54623827374939  Damn Co     Geez St 
37273829349299  Woohoo Co  Under St 
...

I need to select only the rows from df1 on which its index appear on df2.index and merge them, so it would look like this:

                year     revenues    Name      Street
index                                                                    
03374312000153  2010        25432 Yeap Co     Locc St
03374312000153  2009        25433 Yeap Co     Locc St
...

If I try:

df2=df2.merge(df1,left_index=True,right_index=True)

I get an error:

TypeError: type object argument after * must be a sequence, not map

If I try:

df2=df2.join(df1)

I get the same error as above.

Can someone help?

@MisterJT df1.index is a huge list with a lot of company codes and df2.index is list with some company codes. Most of df2.index is inside df1.index. — aabujamra
– aabujamra, Commented Nov 28, 2017 at 21:12
df.merge(df1,right_index=True,left_index=True,how='inner') work well on my side — BENY
– BENY, Commented Nov 28, 2017 at 21:13

MisterJT · Accepted Answer · 2017-11-28 21:30:13Z

I actually see nothing wrong with what you're doing, using Pandas 0.19.2. If your version isn't up to date that could be your issue. Check it with:

import pandas as pd
pd.__version__

How I built your dataframes:

df1 = pd.DataFrame({'year' : pd.Series([2010,2009,2014,2013,2012], index=['03374312000153','03374312000153','48300560000198','48300560000198','48300560000198']),
   'revenues' : pd.Series([25432,25433,13894,18533,18534], index=['03374312000153','03374312000153','48300560000198','48300560000198','48300560000198'])})

df2 = pd.DataFrame({'Name' : pd.Series(['Yeap Co','Damn Co','Woohoo Co'],index=['03374312000153','54623827374939','37273829349299'] ),
                   'Street' : pd.Series(['Locc St','Geez St','Under St'], index=['03374312000153','54623827374939','37273829349299'] )})

df2.merge(df1,left_index=True,right_index=True)


Name    Street  revenues    year
03374312000153  Yeap Co Locc St 25432   2010
03374312000153  Yeap Co Locc St 25433   2009

Some thoughts:

It's not preferred practice to have a non-unique index, in part because if you end up writing to an RDBMS that has a constraint on unique primary key, you'll error out. In this case you'd join on a column as a key instead of the index.
It's good practice to specify (as @Wen did) the 'how' option to your method.
It's good practice to generate a new dataframe from a join instead of writing over an old one. That way if the join fails, especially on a large dataframe, you don't have to re-create the previous dataframes.

Collectives™ on Stack Overflow

Python / Pandas - Merging on index with multiple repeated keys

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related