3

I have this dataframe:

df1:
                year     revenues  
index                                                                    
03374312000153  2010        25432 
03374312000153  2009        25433 
48300560000198  2014        13894  
48300560000198  2013        18533 
48300560000198  2012        18534
NaN             NaN         NaN 
...

And I have this other dataframe:

df2:
                Name         Street  
index                                                                    
03374312000153  Yeap Co     Locc St 
54623827374939  Damn Co     Geez St 
37273829349299  Woohoo Co  Under St 
...

I need to select only the rows from df1 on which its index appear on df2.index and merge them, so it would look like this:

                year     revenues    Name      Street
index                                                                    
03374312000153  2010        25432 Yeap Co     Locc St
03374312000153  2009        25433 Yeap Co     Locc St
...

If I try:

df2=df2.merge(df1,left_index=True,right_index=True)

I get an error:

TypeError: type object argument after * must be a sequence, not map

If I try:

df2=df2.join(df1)

I get the same error as above.

Can someone help?

7
  • What do your indexes look like on each dataframe? Commented Nov 28, 2017 at 21:10
  • @MisterJT df1.index is a huge list with a lot of company codes and df2.index is list with some company codes. Most of df2.index is inside df1.index. Commented Nov 28, 2017 at 21:12
  • Try updating your pandas? Commented Nov 28, 2017 at 21:13
  • 1
    df.merge(df1,right_index=True,left_index=True,how='inner') work well on my side Commented Nov 28, 2017 at 21:13
  • 1
    @abutremutante drop na then, also it would not affact... Commented Nov 28, 2017 at 21:19

1 Answer 1

1

I actually see nothing wrong with what you're doing, using Pandas 0.19.2. If your version isn't up to date that could be your issue. Check it with:

import pandas as pd
pd.__version__

How I built your dataframes:

df1 = pd.DataFrame({'year' : pd.Series([2010,2009,2014,2013,2012], index=['03374312000153','03374312000153','48300560000198','48300560000198','48300560000198']),
   'revenues' : pd.Series([25432,25433,13894,18533,18534], index=['03374312000153','03374312000153','48300560000198','48300560000198','48300560000198'])})

df2 = pd.DataFrame({'Name' : pd.Series(['Yeap Co','Damn Co','Woohoo Co'],index=['03374312000153','54623827374939','37273829349299'] ),
                   'Street' : pd.Series(['Locc St','Geez St','Under St'], index=['03374312000153','54623827374939','37273829349299'] )})

df2.merge(df1,left_index=True,right_index=True)


Name    Street  revenues    year
03374312000153  Yeap Co Locc St 25432   2010
03374312000153  Yeap Co Locc St 25433   2009

Some thoughts:

  • It's not preferred practice to have a non-unique index, in part because if you end up writing to an RDBMS that has a constraint on unique primary key, you'll error out. In this case you'd join on a column as a key instead of the index.
  • It's good practice to specify (as @Wen did) the 'how' option to your method.
  • It's good practice to generate a new dataframe from a join instead of writing over an old one. That way if the join fails, especially on a large dataframe, you don't have to re-create the previous dataframes.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.