Explode index level of DataFrame

Question

I have a data frame with a multi-index where one level has a value that is representative for all other values of that level. So for example (code sample below):

         D
A B   C   
x a   y  0
  b   y  1
  all z  2

Here all is a shorthand for representing all other values of that level so that data frame actually represents:

       D
A B C   
x a y  0
  b y  1
  a z  2
  b z  2

This is also the form which I would like to obtain. For each row containing all in that index level, that row is duplicated for each other value in the index level. If it was a column I could replace each occurrence of all with a list of other values and then use DataFrame.explode.

So I thought about resetting that index level, replacing all occurrences of all with a list of other values, then explode that column and finally set it back as an index:

level_values = sorted(set(df.index.unique('B')) - {'all'})
tmp = df.reset_index('B')
mask = df.index.get_level_values('B') == 'all'
col_index = list(tmp.columns).index('B')
for i in np.argwhere(mask).ravel():
    tmp.iat[i, col_index] = level_values
result = tmp.explode('B').set_index('B', append=True)

That however seems pretty inefficient and the code is not very clear. Also the index levels are in the wrong order now (my actual data frame has more than three index levels, so I can't use swaplevel to reorder it).

So I'm wondering if there's a more concise way of exploding these all values?

Code for generating the sample data frames:

import numpy as np
import pandas as pd

df = pd.DataFrame(
    data=[[0], [1], [2]],
    index=pd.MultiIndex.from_arrays(
        [['x', 'x', 'x'], ['a', 'b', 'all'], ['y', 'y', 'z']],
        names=['A', 'B', 'C']
    ),
    columns=['D']
)

expected = pd.DataFrame(
    data=[[0], [1], [2], [2]],
    index=pd.MultiIndex.from_arrays(
        [['x', 'x', 'x', 'x'], ['a', 'b', 'a', 'b'], ['y', 'y', 'z', 'z']],
        names=['A', 'B', 'C']
    ),
    columns=['D']
)

Andrej Kesely · Accepted Answer · 2020-05-20 19:10:53Z

1

def fn(x):
    l, rv = [], []
    for v in x:
        if v == 'all':
            rv.append(l[:])
            l = []
        else:
            l.append(v)
            rv.append(v)
    return rv


df2 = pd.DataFrame(zip(*df.index)).T.assign(D=df['D'].values)
df2 = df2.apply(fn).explode(1).rename(columns={0:'A', 1:'B', 2:'C'}).set_index(keys=['A', 'B', 'C'])

print(df2)

Prints:

       D
A B C   
x a y  0
  b y  1
  a z  2
  b z  2

answered May 20, 2020 at 19:10

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

C. Braun · Accepted Answer · 2020-10-27 17:10:07Z

0

I run into this frequently and have not been able to find a much better solution than resetting the index and using explode.

You can remove some of the complexity though by resetting the entire index and using map, which is quite efficient.

# Remember index order before resetting
idx_names = df.index.names
df = df.reset_index()

# Build a map to replace "all" with the other unique values
all_map = {"all": sorted(set(df["B"].unique()) - {'all'})}

# map gives NaNs for non-matched entries, but we can just fill those back in
df["B"] = df["B"].map(all_map).fillna(df["B"])

# After mapping, explode does the work for us and we can reset the original index
df.explode("B").set_index(idx_names)

answered Oct 27, 2020 at 17:10

C. Braun

5,2611 gold badge22 silver badges51 bronze badges

Comments

halfer · Accepted Answer · 2022-11-25 00:00:51Z

This is a very narrow scoped function, suitable possibly only for this particular question. It would be helpful if you shared more data, someone else might possibly also have something better:

def explode_index(df):

    #get index where 'all' exists
    for ent in df.index :
        if 'all' in ent:
            val = ent.index('all') 

    #get index name
    expr_var = df.index.get_level_values(1).name

    #create query expression
    expr = f"{expr_var}=='all'"

    #filter with the query expression
    expr_outcome = df.query(expr)

    #get values except 'all'
    others = df.index.get_level_values(expr_var).drop('all').tolist()

    #get the before and after :
    tail = list(expr_outcome.index[0])[val+1:]
    head = list(expr_outcome.index[0])[:val]

    #create combo  of head,others, tail
    from itertools import product
    add_index = list(product(head,others,tail))

    #create new index
    newindex = (df.index.drop(expr_outcome.index)
               .join(pd.Index(add_index, names=df.index.names),how='outer')
                )


    #create new df with reindex
    res = df.reindex(newindex).sort_index()

    #assign value to null rows :
    res.loc[add_index] = expr_outcome.iloc[0].item()
    res = res.astype('uint8')

    return res

explode_index(df)

           D
A   B   C   
x   a   y   0
        z   2
    b   y   1
        z   2

Collectives™ on Stack Overflow

Explode index level of DataFrame

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related