2

I have a dataset, here is an example:

df = DataFrame({"Seconds_left":[5,10,15,25,30,35,5,10,15,30], "Team":["ATL","ATL","ATL","ATL","ATL","ATL","SAS","SAS","SAS","SAS"], "Fouls": [1,2,3,3,4,5,5,4,1,1]})


   Fouls  Seconds_left Team
0      1             5  ATL
1      2            10  ATL
2      3            15  ATL
3      3            25  ATL
4      4            30  ATL
5      5            35  ATL
6      5             5  SAS
7      4            10  SAS
8      1            15  SAS
9      1            30  SAS

Now I would like to insert rows where data in the Seconds_left column is missing:

Id Fouls Seconds_left   Team
0      1            5    ATL
1      2           10    ATL
2      3           15    ATL
3    NaN           20    ATL
4      3           25    ATL
5      4           30    ATL
6      5           35    ATL
7      5            5    SAS
8      4           10    SAS
9      1           15    SAS
10   NaN           20    SAS
11   NaN           25    SAS
12     1           30    SAS
13   NaN           35    SAS

I tried already with reindexing etc. but obviously it does not function as there are duplicates.

Has somebody got any idea how to solve this?

Thanks!

1
  • I'm not sure if I understand what you want to do here. Can you explain a bit with a input, your desired output and the logic? Commented Apr 28, 2017 at 21:58

3 Answers 3

5

Create a MultiIndex and reindex + reset_index:

idx = pd.MultiIndex.from_product([df['Team'].unique(), 
                                  np.arange(5, df['Seconds_left'].max()+1, 5)],
                                 names=['Team', 'Seconds_left'])

df.set_index(['Team', 'Seconds_left']).reindex(idx).reset_index()
Out: 
   Team  Seconds_left  Fouls
0   ATL             5    1.0
1   ATL            10    2.0
2   ATL            15    3.0
3   ATL            20    NaN
4   ATL            25    3.0
5   ATL            30    4.0
6   ATL            35    5.0
7   SAS             5    5.0
8   SAS            10    4.0
9   SAS            15    1.0
10  SAS            20    NaN
11  SAS            25    NaN
12  SAS            30    1.0
13  SAS            35    NaN
Sign up to request clarification or add additional context in comments.

Comments

1

An approach using groupby and merge:

df_left = pd.DataFrame({'Seconds_left':[5,10,15,20,25,30,35]})

df_out = df.groupby('Team', as_index=False).apply(lambda x: x.merge(df_left, how='right', on='Seconds_left'))

df_out['Team'] = df_out['Team'].fillna(method='ffill')

df_out = df_out.reset_index(drop=True).sort_values(by=['Team','Seconds_left'])

print(df_out)

Output:

    Fouls  Seconds_left Team
0     1.0             5  ATL
1     2.0            10  ATL
2     3.0            15  ATL
6     NaN            20  ATL
3     3.0            25  ATL
4     4.0            30  ATL
5     5.0            35  ATL
7     5.0             5  SAS
8     4.0            10  SAS
9     1.0            15  SAS
11    NaN            20  SAS
12    NaN            25  SAS
10    1.0            30  SAS
13    NaN            35  SAS

Comments

-1
import pandas as pd
import numpy as np


df = pd.DataFrame(columns = ['a', 'b'])

df.loc[len(df)] = [1,np.NaN]

1 Comment

Thanks but this only inserts one row with NaN. The problem is that my data is discontinuous and would like for both teams to have same data under Seconds_left.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.