1

I have a dataframe like this:

start   stop    speaker_label   y
309.16  309.58         2       5
312.01  312.59         2       5
313.4   313.59         1       4
314.35  314.92         2       4
316.96  317.27         1       5
319.36  319.89         1       5
322.01  323.10         2       7

I want to transform this dataframe in a few ways:

  • Convert each row to represent 1 second. start and stop represent the time (in seconds) that an event occurs. I want to explode this so that that I get 1 row per second. When converting floats to int, I want to round dow.
  • I want to create 2 new columsn, y1 and y2, which come from a cross between the speaker_label and y columns. If speaker_label 1 is 5 at y, then column y1 for that row is 5.
  • If there are rows of seconds that do not fall within a start stop range and therefor have no speaker_label or y data, then I want the values to be NaN.

It should look like this:

time    y1  y2
309    NaN  5
310    NaN  5
311    NaN  5
312    NaN  5
313    4    NaN
314    NaN  4
315    NaN  4
316    5    NaN
317    5    NaN
318    5    NaN
319    5    NaN
320    NaN  NaN
321    NaN  NaN
322    NaN  7
323    NaN  7

If the speaker_label value changes (1 for 311.01 and 2 for 311.99), then the speaker_label value for 1 will go to y1 and the speaker_label value for 2 will go to y2. If the speaker_label value does not change in this circumstance, then assign the y value at 311.01 to 311 and not consider the 311.99 y value. I added this circumstance to the OP.

3
  • Would there ever be overlap with two events within the same second? Or what if something ends at, say, 311.01 and the next one starts at 311.99? Commented Apr 7, 2021 at 20:29
  • Yes, that could happen. If the speaker_label value changes (1 for 311.01 and 2 for 311.99), then the speaker_label value for 1 will go to y1 and the speaker_label value for 2 will go to y2. If the speaker_label value does not change in this circumstance, then assign the y value at 311.01 to 311 and not consider the 311.99 y value. I added this circumstance to the OP. Commented Apr 7, 2021 at 20:34
  • Given the sample input, why isn't y2=NaN for 310/311/315 in the expected output? Is that related to the last paragraph? Commented Apr 7, 2021 at 21:13

1 Answer 1

1

As I understand the bullet requirements, you can explode, pivot, and reindex:

  1. explode() the startstop intervals into time rows:
df['time'] = df.apply(lambda x: range(int(x.start), 1+int(x.stop)), axis=1)
df = df.explode('time').drop(columns=['start', 'stop']).set_index('time')

#       speaker_label  y
# time                  
# 309               2  5
# 312               2  5
# 313               1  4
# 314               2  4
# 316               1  5
# 317               1  5
# 319               1  5
# 322               2  7
# 323               2  7
  1. Pivot into y columns using pivot_table():
df = df.pivot_table(index='time', columns='speaker_label')

#        y     
#        1    2
# time
# 309    NaN  5.0
# 312    NaN  5.0
# 313    4.0  NaN
# 314    NaN  4.0
# 316    5.0  NaN
# 317    5.0  NaN
# 319    5.0  NaN
# 322    NaN  7.0
# 323    NaN  7.0
  1. reindex() the missing time steps:
df = df.reindex(range(df.index.min(), 1+df.index.max()))

#        y     
#        1    2
# time
# 309    NaN  5.0
# 310    NaN  NaN
# 311    NaN  NaN
# 312    NaN  5.0
# 313    4.0  NaN
# 314    NaN  4.0
# 315    NaN  NaN
# 316    5.0  NaN
# 317    5.0  NaN
# 318    NaN  NaN
# 319    5.0  NaN
# 320    NaN  NaN
# 321    NaN  NaN
# 322    NaN  7.0
# 323    NaN  7.0

Note that this doesn't match your expected output exactly, but it's how I interpreted the bullet requirements. This method puts NaNs when neither speaker_label is active. I couldn't figure out why your expected output put values in some of those cases.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.