2

I have gone through Convert columns into rows with Pandas and Merge timestamp column in pandas , the goal is to first group data by ID and then convert start_time column into an entity in the process column

Given

  start_time   time   process  ID
    14:05      14:16  A       1
    14:05      14:34  B       1
    14:05      15:00  C       1
    14:05      15:10  D       1
    14:12      14:19  A       2
    14:12      14:54  B       2

Goal

time   process    ID
14:05  start_time 1    (Previously it was in separate column)
14:16  A          1
14:34  B          1
15:00  C          1
15:10  D          1
14:12  start_time 2
14:19  A          2
14:54  B          2


df.groupby('ID').melt(df.columns.difference(['start_time']), value_name='time')

Note:start_time value in each ID remains the same.

3
  • cound you elaborate your problem, please? Commented Jun 30, 2022 at 12:44
  • I am not sure what did you not get here ? Commented Jun 30, 2022 at 12:48
  • can you update your example with another ID for clarity? Commented Jun 30, 2022 at 12:51

2 Answers 2

1

You can treat your data as 2 separate DataFrames and recombine them like so:

# Extract start_times and clean up to match column names
start_times = (
    df[['start_time', 'ID']]
    .drop_duplicates()
    .rename(columns={'start_time': 'time'})
    .assign(process='start_time')
)

# combine data vertically
out = (
    pd.concat([start_times, df.drop(columns='start_time')])
    .sort_values(['ID', 'time'])
    .reset_index(drop=True)
)

print(out)
    time  ID     process
0  14:05   1  start_time
1  14:16   1           A
2  14:34   1           B
3  15:00   1           C
4  15:10   1           D
5  14:12   2  start_time
6  14:19   2           A
7  14:54   2           B
Sign up to request clarification or add additional context in comments.

Comments

1

You could use:

cols = df.columns.difference(['start_time', 'process']).to_list()

# identify first row per group
mask = ~df.duplicated('ID')
# melt first row per group
df2 = (df[mask]
       .drop(columns=['process', 'time'])
       .melt(cols, var_name='process', value_name='time')
      )
# concatenate with original dataframe and reorder
out = (pd.concat([df2, df])
         .sort_values(by='ID', kind='stable')
         [['time', 'process']+cols]
         #.reset_index(drop=True) # optional
       )

output:

    time     process  ID
0  14:05  start_time   1
0  14:16           A   1
1  14:34           B   1
2  15:00           C   1
3  15:10           D   1
1  14:12  start_time   2
4  14:19           A   2
5  14:54           B   2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.