2

**I'm new to python language and need help replacing/filling values in df2 based on df1 dataframe inputs:

Below is a description for the both Dataframes**:

<df1>

The table contains employee code and start & End time with entry type:

           Start        End entry_type
Code                                  
11111 2022-12-12 2023-02-20          M
11114 2023-02-05 2023-02-05          S
11113 2023-02-15 2023-02-15          S
11112 2023-02-18 2023-02-18          S
11114 2023-02-20 2023-02-20          V
11112 2023-02-21 2023-02-21          V
11112 2023-02-26 2023-02-26          V
11111 2023-03-06 2023-03-16          V
11114 2023-03-07 2023-03-12          T
11114 2023-03-10 2023-03-10          T
11111 2023-03-19 2023-03-19          T

df2

***** Constructing and building a df2 DataFrame from df1 **

df1.Start.min()

df1.End.max()

columns = pd.date_range(start=df1.Start.min(), end=df1.End.max(), freq='D')

index = df1.index.unique()

df2= pd.DataFrame(columns=columns, index=index)

print(df2.head())

    2022-12-12 2022-12-13 2022-12-14 2022-12-15 2022-12-16 2022-12-17  \
Code                                                                      
11111        NaN        NaN        NaN        NaN        NaN        NaN   
11114        NaN        NaN        NaN        NaN        NaN        NaN   
11113        NaN        NaN        NaN        NaN        NaN        NaN   
11112        NaN        NaN        NaN        NaN        NaN        NaN  

**The requirement is to fill df2 Dataframe with the designated entrytype based on the following columns value in table df1 (Start, End, entry_type) using a for loop or any alternative method **

not sure how to approach the task correctly.

2
  • No I'm trying to automate a task, and i'm new to coding, my back ground mostly excel.. Commented Apr 23, 2023 at 15:47
  • for 11114/2023-03-10 you have twice the entry_type M, what if you had two different entries? Commented Apr 23, 2023 at 16:19

2 Answers 2

2

IIUC, you can use pivot_table :

df2 = (df1.assign(date_rng= df1.apply(lambda x: 
             pd.date_range(start=x["Start"], end=x["End"]).tolist(), axis=1))
          .explode("date_rng").reset_index().pivot_table(
             index="code", columns="date_rng", values="entry_type", aggfunc="first")
          .rename_axis(index=None)
)

​ Output :

print(df2)

date_rng 2022-12-12 2022-12-13 2022-12-14  ... 2023-03-15 2023-03-16 2023-03-19
11111             M          M          M  ...          V          V          T
11112           NaN        NaN        NaN  ...        NaN        NaN        NaN
11113           NaN        NaN        NaN  ...        NaN        NaN        NaN
11114           NaN        NaN        NaN  ...        NaN        NaN        NaN

[4 rows x 85 columns]

Input used (df1) :

df1 = pd.DataFrame(
    {'Start': {11111: '2023-03-19',
      11114: '2023-03-10',
      11113: '2023-02-15',
      11112: '2023-02-26'},
      'End': {11111: '2023-03-19',
      11114: '2023-03-10',
      11113: '2023-02-15',
      11112: '2023-02-26'},
     'entry_type': {11111: 'T', 11114: 'T', 11113: 'S', 11112: 'V'}}
)
Sign up to request clarification or add additional context in comments.

5 Comments

Thank you very much for the fast reply. the code works but how can fill all the days in between with the same entry value. for example employee code 11111 has an entry of 'M' that starts on 2022-12-12 and ends on 2023-02-20.
I updated my answer, can you check it out ?
Many Thanks Timeless for your help, it's reflecting all days now that is in between.
If you need all the dates (98 instead of 85), you need to go for @mozway's answer or add .reindex(columns=pd.date_range(start=df1["Start"].min(), end=df1["End"].max(), freq="D")) as a last chain to df2.
Thanks Timeless for the last adjustment both codes are very helpful in solving the problem.
0

One option using repeat with the number of days between start and end to efficiently create all the dates, and pivot_table to reshape:

df1['Start'] = pd.to_datetime(df1['Start'])
df1['End'] = pd.to_datetime(df1['End'])

tmp = df1.reset_index()

out = (
 tmp.loc[tmp.index.repeat(df1['End'].sub(df1['Start']).dt.days.add(1))]
    .assign(date=lambda d: d['Start'].add(pd.to_timedelta(d.groupby(level=0).cumcount(), unit='D')))
    .pivot_table(index='Code', columns='date', values='entry_type', aggfunc='first')
    .reindex(columns=pd.date_range(start=df1['Start'].min(), end=df1['End'].max(), freq='D'))
)

Output:

      2022-12-12 2022-12-13 2022-12-14 2022-12-15 2022-12-16 2022-12-17  \
Code                                                                      
11111          M          M          M          M          M          M   
11112        NaN        NaN        NaN        NaN        NaN        NaN   
11113        NaN        NaN        NaN        NaN        NaN        NaN   
11114        NaN        NaN        NaN        NaN        NaN        NaN   

      2022-12-18 2022-12-19 2022-12-20 2022-12-21  ... 2023-03-10 2023-03-11  \
Code                                               ...                         
11111          M          M          M          M  ...          V          V   
11112        NaN        NaN        NaN        NaN  ...        NaN        NaN   
11113        NaN        NaN        NaN        NaN  ...        NaN        NaN   
11114        NaN        NaN        NaN        NaN  ...          T          T   

      2023-03-12 2023-03-13 2023-03-14 2023-03-15 2023-03-16 2023-03-17  \
Code                                                                      
11111          V          V          V          V          V        NaN   
11112        NaN        NaN        NaN        NaN        NaN        NaN   
11113        NaN        NaN        NaN        NaN        NaN        NaN   
11114          T        NaN        NaN        NaN        NaN        NaN   

      2023-03-18 2023-03-19  
Code                         
11111        NaN          T  
11112        NaN        NaN  
11113        NaN        NaN  
11114        NaN        NaN  

[4 rows x 98 columns]

1 Comment

Thanks mozway, this code covers the requirement as well reflecting the missing days which has no entries. this is bonus point for me. appreciate your help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.