4

I have a panel dataset as df

stock    year    date   return
VOD      2017    01-01   0.05
VOD      2017    01-02   0.03
VOD      2017    01-03   0.04
...      ...     ...     ....
BAT      2017    01-01   0.05
BAT      2017    01-02   0.07
BAT      2017    01-03   0.10

so I use this code to get the mean and skewness of the return for each stock in each year.

df2=df.groupby(['stock','year']).mean().reset_index()
df3=df.groupby(['stock','year']).skew().reset_index()

df2 and df3 look fine.

df2 is like (after I change the column name)

stock    year   mean_return
 VOD     2017    0.09
 BAT     2017    0.14
 ...      ...    ...

df3 is like (after I change the column name)

stock    year   return_skewness
 VOD     2017    -0.34
 BAT     2017    -0.04
 ...      ...    ...

The problem is when I tried to merge df2 and df3 by using

want=pd.merge(df2,df2, on=['stock','year'],how='outer')

python gave me

'The column label 'stock' is not unique.
For a multi-index, the label must be a tuple with elements corresponding to each level.'

, which confuses me alot.

I can use want = pd.merge(df2,df3, left_index=True, right_index=True, how='outer') to merge df2 and df3, but after that i have to rename the columns as column names are in parentheses.

Is there any convenient way to merge df2 and df3 ? Thanks

2 Answers 2

3

Better is use agg for specify aggregate function in list and column for aggregation after function:

df3 = (df.groupby(['stock','year'])['return']
         .agg([('mean_return','mean'),('return_skewness','skew')])
         .reset_index())
print (df3)
  stock  year  mean_return  return_skewness
0   BAT  2017     0.073333         0.585583
1   VOD  2017     0.040000         0.000000

Your solution should be changed with remove reset_index, rename and last concat, also is specified column return for aggregate:

s2=df.groupby(['stock','year'])['return'].mean().rename('mean_return')
s3=df.groupby(['stock','year'])['return'].skew().rename('return_skewness')

df3 = pd.concat([s2, s3], axis=1).reset_index()
print (df3)
  stock  year  mean_return  return_skewness
0   BAT  2017     0.073333         0.585583
1   VOD  2017     0.040000         0.000000

EDIT:

If need aggregate all numeric columns remove list after groupby first and then use map with join for flatten MultiIndex:

print (df)
  stock  year   date  return  col
0   VOD  2017  01-01    0.05    1
1   VOD  2017  01-02    0.03    8
2   VOD  2017  01-03    0.04    9
3   BAT  2017  01-01    0.05    1
4   BAT  2017  01-02    0.07    4
5   BAT  2017  01-03    0.10    3

df3 = df.groupby(['stock','year']).agg(['mean','skew'])
print (df3)
              return                 col          
                mean      skew      mean      skew
stock year                                        
BAT   2017  0.073333  0.585583  2.666667 -0.935220
VOD   2017  0.040000  0.000000  6.000000 -1.630059

df3.columns = df3.columns.map('_'.join)
df3 = df3.reset_index()
print (df3)
  stock  year  return_mean  return_skew  col_mean  col_skew
0   BAT  2017     0.073333     0.585583  2.666667 -0.935220
1   VOD  2017     0.040000     0.000000  6.000000 -1.630059

Your solutions should be changed:

df2=df.groupby(['stock','year']).mean().add_prefix('mean_')
df3=df.groupby(['stock','year']).skew().add_prefix('skew_')


df3 = pd.concat([df2, df3], axis=1).reset_index()
print (df3)
  stock  year  mean_return  mean_col  skew_return  skew_col
0   BAT  2017     0.073333  2.666667     0.585583 -0.935220
1   VOD  2017     0.040000  6.000000     0.000000 -1.630059
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks @jezrael . I also tried to apply concat to merge two dataset. However, some columns end up in parentheses and I have to manually change the column names. Do you happen to know how to solve this ?
@FlyUFalcon - sure, so there is more column like return ?
Yes. the columns of df2 changed to (stock,) (year,) and (return,) . So I have to change and delete some columns. The dataset I showed is a simplified one. I have quite a few columns in my dataset. Thanks
Thanks ! appreciated !
3

A easier way to bypass this issue:

df2.to_clipboard(index=False)
df2clip=pd.read_clipboard(sep='\t')

df3.to_clipboard(index=False)
df3clip=pd.read_clipboard(sep='\t')

Then merge 2 df again:

pd.merge(df2clip,df3clip,on=['stock','year'],how='outer')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.