Skip to content

API: should reindex on a level introduce NaNs for missing entries per label of other levels? #12319

@jorisvandenbossche

Description

@jorisvandenbossche

Suppose the following dataframe and reindex operation:

In [65]: df = pd.DataFrame(np.arange(12).reshape(4,3), columns=pd.MultiIndex.from_tuples([('A', 'a'), ('A', 'b'), ('B', 'a')]))

In [66]: df
Out[66]:
   A       B
   a   b   a
0  0   1   2
1  3   4   5
2  6   7   8
3  9  10  11

In [67]: df.reindex(columns=['a', 'b'], level=1)
Out[67]:
   A       B
   a   b   a
0  0   1   2
1  3   4   5
2  6   7   8
3  9  10  11

Should this give the following?

In [67]: df.reindex(columns=['a', 'b'], level=1)
Out[67]:
   A       B
   a   b   a    b
0  0   1   2  NaN
1  3   4   5  NaN
2  6   7   8  NaN
3  9  10  11  NaN

I am not sure what the exact behaviour of the level keyword should be, but eg in the following example it does the selection of columns for each of label of the other level:

In [69]: df2 = pd.DataFrame(np.arange(18).reshape(3,6), columns=pd.MultiIndex.from_product([('A', 'B'), ('a', 'b', 'c')]))
In [70]:

In [70]: df2
Out[70]:
    A           B
    a   b   c   a   b   c
0   0   1   2   3   4   5
1   6   7   8   9  10  11
2  12  13  14  15  16  17

In [71]: df2.reindex(columns=['a', 'c'], level=1)
Out[71]:
    A       B
    a   c   a   c
0   0   2   3   5
1   6   8   9  11
2  12  14  15  17

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignIndexingRelated to indexing on series/frames, not to indexes themselvesMultiIndexNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions