Grouping the data then converting timestamp column to row

Question

I have gone through Convert columns into rows with Pandas and Merge timestamp column in pandas , the goal is to first group data by ID and then convert start_time column into an entity in the process column

Given

  start_time   time   process  ID
    14:05      14:16  A       1
    14:05      14:34  B       1
    14:05      15:00  C       1
    14:05      15:10  D       1
    14:12      14:19  A       2
    14:12      14:54  B       2

Goal

time   process    ID
14:05  start_time 1    (Previously it was in separate column)
14:16  A          1
14:34  B          1
15:00  C          1
15:10  D          1
14:12  start_time 2
14:19  A          2
14:54  B          2


df.groupby('ID').melt(df.columns.difference(['start_time']), value_name='time')

Note:start_time value in each ID remains the same.

cound you elaborate your problem, please?

Amey
– Amey

2022-06-30 12:44:40 +00:00
Commented Jun 30, 2022 at 12:44 — Amey
– Amey, Commented Jun 30, 2022 at 12:44
I am not sure what did you not get here ?

alex3465
– alex3465

2022-06-30 12:48:06 +00:00
Commented Jun 30, 2022 at 12:48 — alex3465
– alex3465, Commented Jun 30, 2022 at 12:48
can you update your example with another ID for clarity?

mozway
– mozway

2022-06-30 12:51:28 +00:00
Commented Jun 30, 2022 at 12:51 — mozway
– mozway, Commented Jun 30, 2022 at 12:51

Cameron Riddell · Accepted Answer · 2022-06-30 13:31:46Z

1

You can treat your data as 2 separate DataFrames and recombine them like so:

# Extract start_times and clean up to match column names
start_times = (
    df[['start_time', 'ID']]
    .drop_duplicates()
    .rename(columns={'start_time': 'time'})
    .assign(process='start_time')
)

# combine data vertically
out = (
    pd.concat([start_times, df.drop(columns='start_time')])
    .sort_values(['ID', 'time'])
    .reset_index(drop=True)
)

print(out)
    time  ID     process
0  14:05   1  start_time
1  14:16   1           A
2  14:34   1           B
3  15:00   1           C
4  15:10   1           D
5  14:12   2  start_time
6  14:19   2           A
7  14:54   2           B

answered Jun 30, 2022 at 13:31

Cameron Riddell

13.8k14 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mozway · Accepted Answer · 2022-06-30 13:14:47Z

1

You could use:

cols = df.columns.difference(['start_time', 'process']).to_list()

# identify first row per group
mask = ~df.duplicated('ID')
# melt first row per group
df2 = (df[mask]
       .drop(columns=['process', 'time'])
       .melt(cols, var_name='process', value_name='time')
      )
# concatenate with original dataframe and reorder
out = (pd.concat([df2, df])
         .sort_values(by='ID', kind='stable')
         [['time', 'process']+cols]
         #.reset_index(drop=True) # optional
       )

output:

    time     process  ID
0  14:05  start_time   1
0  14:16           A   1
1  14:34           B   1
2  15:00           C   1
3  15:10           D   1
1  14:12  start_time   2
4  14:19           A   2
5  14:54           B   2

edited Jun 30, 2022 at 13:14

answered Jun 30, 2022 at 13:09

mozway

267k13 gold badges56 silver badges106 bronze badges

Collectives™ on Stack Overflow

Grouping the data then converting timestamp column to row

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related