Pandas - insert rows where data is missing

Question

I have a dataset, here is an example:

df = DataFrame({"Seconds_left":[5,10,15,25,30,35,5,10,15,30], "Team":["ATL","ATL","ATL","ATL","ATL","ATL","SAS","SAS","SAS","SAS"], "Fouls": [1,2,3,3,4,5,5,4,1,1]})


   Fouls  Seconds_left Team
0      1             5  ATL
1      2            10  ATL
2      3            15  ATL
3      3            25  ATL
4      4            30  ATL
5      5            35  ATL
6      5             5  SAS
7      4            10  SAS
8      1            15  SAS
9      1            30  SAS

Now I would like to insert rows where data in the Seconds_left column is missing:

Id Fouls Seconds_left   Team
0      1            5    ATL
1      2           10    ATL
2      3           15    ATL
3    NaN           20    ATL
4      3           25    ATL
5      4           30    ATL
6      5           35    ATL
7      5            5    SAS
8      4           10    SAS
9      1           15    SAS
10   NaN           20    SAS
11   NaN           25    SAS
12     1           30    SAS
13   NaN           35    SAS

I tried already with reindexing etc. but obviously it does not function as there are duplicates.

Has somebody got any idea how to solve this?

Thanks!

I'm not sure if I understand what you want to do here. Can you explain a bit with a input, your desired output and the logic? — Allen Qin
– Allen Qin, Commented Apr 28, 2017 at 21:58

user2285236 · Accepted Answer · 2017-04-28 22:03:47Z

5

Create a MultiIndex and reindex + reset_index:

idx = pd.MultiIndex.from_product([df['Team'].unique(), 
                                  np.arange(5, df['Seconds_left'].max()+1, 5)],
                                 names=['Team', 'Seconds_left'])

df.set_index(['Team', 'Seconds_left']).reindex(idx).reset_index()
Out: 
   Team  Seconds_left  Fouls
0   ATL             5    1.0
1   ATL            10    2.0
2   ATL            15    3.0
3   ATL            20    NaN
4   ATL            25    3.0
5   ATL            30    4.0
6   ATL            35    5.0
7   SAS             5    5.0
8   SAS            10    4.0
9   SAS            15    1.0
10  SAS            20    NaN
11  SAS            25    NaN
12  SAS            30    1.0
13  SAS            35    NaN

answered Apr 28, 2017 at 22:03

user2285236

Sign up to request clarification or add additional context in comments.

Comments

Scott Boston · Accepted Answer · 2017-04-28 22:18:08Z

An approach using groupby and merge:

df_left = pd.DataFrame({'Seconds_left':[5,10,15,20,25,30,35]})

df_out = df.groupby('Team', as_index=False).apply(lambda x: x.merge(df_left, how='right', on='Seconds_left'))

df_out['Team'] = df_out['Team'].fillna(method='ffill')

df_out = df_out.reset_index(drop=True).sort_values(by=['Team','Seconds_left'])

print(df_out)

Output:

    Fouls  Seconds_left Team
0     1.0             5  ATL
1     2.0            10  ATL
2     3.0            15  ATL
6     NaN            20  ATL
3     3.0            25  ATL
4     4.0            30  ATL
5     5.0            35  ATL
7     5.0             5  SAS
8     4.0            10  SAS
9     1.0            15  SAS
11    NaN            20  SAS
12    NaN            25  SAS
10    1.0            30  SAS
13    NaN            35  SAS

Carl Zheng · Accepted Answer · 2017-04-28 21:20:33Z

-1

import pandas as pd
import numpy as np


df = pd.DataFrame(columns = ['a', 'b'])

df.loc[len(df)] = [1,np.NaN]

answered Apr 28, 2017 at 21:20

Carl Zheng

7482 gold badges6 silver badges20 bronze badges

1 Comment

Jure Stabuc Over a year ago

Thanks but this only inserts one row with NaN. The problem is that my data is discontinuous and would like for both teams to have same data under Seconds_left.

Collectives™ on Stack Overflow

Pandas - insert rows where data is missing

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related