Pandas 'multi-index' issue in merging dataframes

Question

I have a panel dataset as df

stock    year    date   return
VOD      2017    01-01   0.05
VOD      2017    01-02   0.03
VOD      2017    01-03   0.04
...      ...     ...     ....
BAT      2017    01-01   0.05
BAT      2017    01-02   0.07
BAT      2017    01-03   0.10

so I use this code to get the mean and skewness of the return for each stock in each year.

df2=df.groupby(['stock','year']).mean().reset_index()
df3=df.groupby(['stock','year']).skew().reset_index()

df2 and df3 look fine.

df2 is like (after I change the column name)

stock    year   mean_return
 VOD     2017    0.09
 BAT     2017    0.14
 ...      ...    ...

df3 is like (after I change the column name)

stock    year   return_skewness
 VOD     2017    -0.34
 BAT     2017    -0.04
 ...      ...    ...

The problem is when I tried to merge df2 and df3 by using

want=pd.merge(df2,df2, on=['stock','year'],how='outer')

python gave me

'The column label 'stock' is not unique.
For a multi-index, the label must be a tuple with elements corresponding to each level.'

, which confuses me alot.

I can use want = pd.merge(df2,df3, left_index=True, right_index=True, how='outer') to merge df2 and df3, but after that i have to rename the columns as column names are in parentheses.

Is there any convenient way to merge df2 and df3 ? Thanks

jezrael · Accepted Answer · 2019-02-22 14:30:35Z

3

Better is use agg for specify aggregate function in list and column for aggregation after function:

df3 = (df.groupby(['stock','year'])['return']
         .agg([('mean_return','mean'),('return_skewness','skew')])
         .reset_index())
print (df3)
  stock  year  mean_return  return_skewness
0   BAT  2017     0.073333         0.585583
1   VOD  2017     0.040000         0.000000

Your solution should be changed with remove reset_index, rename and last concat, also is specified column return for aggregate:

s2=df.groupby(['stock','year'])['return'].mean().rename('mean_return')
s3=df.groupby(['stock','year'])['return'].skew().rename('return_skewness')

df3 = pd.concat([s2, s3], axis=1).reset_index()
print (df3)
  stock  year  mean_return  return_skewness
0   BAT  2017     0.073333         0.585583
1   VOD  2017     0.040000         0.000000

EDIT:

If need aggregate all numeric columns remove list after groupby first and then use map with join for flatten MultiIndex:

print (df)
  stock  year   date  return  col
0   VOD  2017  01-01    0.05    1
1   VOD  2017  01-02    0.03    8
2   VOD  2017  01-03    0.04    9
3   BAT  2017  01-01    0.05    1
4   BAT  2017  01-02    0.07    4
5   BAT  2017  01-03    0.10    3

df3 = df.groupby(['stock','year']).agg(['mean','skew'])
print (df3)
              return                 col          
                mean      skew      mean      skew
stock year                                        
BAT   2017  0.073333  0.585583  2.666667 -0.935220
VOD   2017  0.040000  0.000000  6.000000 -1.630059

df3.columns = df3.columns.map('_'.join)
df3 = df3.reset_index()
print (df3)
  stock  year  return_mean  return_skew  col_mean  col_skew
0   BAT  2017     0.073333     0.585583  2.666667 -0.935220
1   VOD  2017     0.040000     0.000000  6.000000 -1.630059

Your solutions should be changed:

df2=df.groupby(['stock','year']).mean().add_prefix('mean_')
df3=df.groupby(['stock','year']).skew().add_prefix('skew_')


df3 = pd.concat([df2, df3], axis=1).reset_index()
print (df3)
  stock  year  mean_return  mean_col  skew_return  skew_col
0   BAT  2017     0.073333  2.666667     0.585583 -0.935220
1   VOD  2017     0.040000  6.000000     0.000000 -1.630059

edited Feb 22, 2019 at 14:30

answered Feb 22, 2019 at 13:58

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

FlyUFalcon Over a year ago

Thanks @jezrael . I also tried to apply concat to merge two dataset. However, some columns end up in parentheses and I have to manually change the column names. Do you happen to know how to solve this ?

jezrael Over a year ago

@FlyUFalcon - sure, so there is more column like return ?

FlyUFalcon Over a year ago

Yes. the columns of df2 changed to (stock,) (year,) and (return,) . So I have to change and delete some columns. The dataset I showed is a simplified one. I have quite a few columns in my dataset. Thanks

FlyUFalcon Over a year ago

Thanks ! appreciated !

hollypeter · Accepted Answer · 2019-11-26 03:41:33Z

3

A easier way to bypass this issue:

df2.to_clipboard(index=False)
df2clip=pd.read_clipboard(sep='\t')

df3.to_clipboard(index=False)
df3clip=pd.read_clipboard(sep='\t')

Then merge 2 df again:

pd.merge(df2clip,df3clip,on=['stock','year'],how='outer')

edited Nov 26, 2019 at 3:41

answered Nov 26, 2019 at 3:31

hollypeter

313 bronze badges

Collectives™ on Stack Overflow

Pandas 'multi-index' issue in merging dataframes

2 Answers 2

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related