I'm using Pandas and Numpy on Python3 with the following versions:
- Python 3.5.1 (via Anaconda 2.5.0) 64 bits
- Pandas 0.19.1
- Numpy 1.11.2 (probably not relevant here)
Here is the minimal code producing the problem:
import pandas as pd
import numpy as np
a = pd.DataFrame({'i' : [1,1,1,1,1], 'a': [1,2,5,6,100], 'b': [2, 4,10, np.nan, np.nan]})
a.set_index(keys='a', inplace=True)
v = a.groupby(level=0).apply(lambda x: x.sort_values(by='i')['b'].rolling(2, min_periods=0).mean())
v.index.names
This code is a simple groupby-apply, but I don't understand the outcome:
FrozenList(['a', 'a'])
For some reason, the index of the result is ['a', 'a'], which seems to be a very doubtful choice from pandas. I would have expected a simple ['a'].
Does anyone have some idea about why Pandas chooses to duplicate the column in the index?
Thanks in advance.
sort_valuesthis returns a new df so the index is being concatenated with the existinggroupbyindex, you could argue that it shouldn't do this but normally it's expecting a scalar value to be returned, as a Series or DataFrame is being returned it looks like it's aligning and concatenating herea.groupby()you should have the parameteras_index=False[None, 'a']for the index names here when you passindex=FalseI think the OP is querying why there is are 2 levels of indices here, as well as the duplication of the index