Reindex 2nd level in incomplete multi-level dataframe to be complete, inserting NANs on missing rows

Question

I need to reindex the 2nd level of a pandas dataframe, so that the 2nd level becomes a (complete) list 0,...,(N-1) for each 1st level index.

I tried using Allan/Hayden's approach, but unfortunately it only creates an index with as many rows as previously existing.
What I want is that for each new index, new rows are inserted (with nan values).

Example:

df = pd.DataFrame({
  'first': ['one', 'one', 'one', 'two', 'two', 'three'], 
  'second': [0, 1, 2, 0, 1, 1],
  'value': [1, 2, 3, 4, 5, 6]
})
print df

   first  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
5  three       1      6

# Tried using Allan/Hayden's approach, but no good for this, doesn't add the missing rows    
df['second'] = df.reset_index().groupby(['first']).cumcount()
print df
   first  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
5  three       0      6

My desired result is:

   first  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
4    two       2      nan <-- INSERTED
5  three       0      6
5  three       1      nan <-- INSERTED
5  three       2      nan <-- INSERTED

Could you just first create the data frame with all of the rows you need. Then update it with the values you have. — Pekka
– Pekka, Commented Aug 9, 2015 at 8:22
are the indices in "second" always contiguous and starting from 0? — chris-sc
– chris-sc, Commented Aug 9, 2015 at 8:22
Missing words from title: you want to Reindex 2nd level in incomplete multi-level dataframe to be complete, insert NANs on missing rows — smci
– smci, Commented Jul 16, 2022 at 19:59
Also, saying np.arange(N) is pretty obscure to non-numpy users, clearer to just say 0,...,(N-1) — smci
– smci, Commented Jul 16, 2022 at 20:01
In general, don't use groupby() as a poor-man's multiindex, do .set_index(['first', 'second']) wherever possible. — smci
– smci, Commented Jul 16, 2022 at 20:05

Jianxun Li · Accepted Answer · 2015-08-09 09:08:02Z

5

I think you can first set columns first and second as multi-level index, and then reindex.

# your data
# ==========================
df = pd.DataFrame({
  'first': ['one', 'one', 'one', 'two', 'two', 'three'], 
  'second': [0, 1, 2, 0, 1, 1],
  'value': [1, 2, 3, 4, 5, 6]
})

df

   first  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
5  three       1      6

# processing
# ============================
multi_index = pd.MultiIndex.from_product([df['first'].unique(), np.arange(3)], names=['first', 'second'])

df.set_index(['first', 'second']).reindex(multi_index).reset_index()

   first  second  value
0    one       0      1
1    one       1      2
2    one       2      3
3    two       0      4
4    two       1      5
5    two       2    NaN
6  three       0    NaN
7  three       1      6
8  three       2    NaN

answered Aug 9, 2015 at 9:08

Jianxun Li

24.9k10 gold badges64 silver badges78 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Gulzar Over a year ago

why is the .reset_index required after .reindex?

Collectives™ on Stack Overflow

Reindex 2nd level in incomplete multi-level dataframe to be complete, inserting NANs on missing rows

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related