I have a dataframe like this:
start stop speaker_label y
309.16 309.58 2 5
312.01 312.59 2 5
313.4 313.59 1 4
314.35 314.92 2 4
316.96 317.27 1 5
319.36 319.89 1 5
322.01 323.10 2 7
I want to transform this dataframe in a few ways:
- Convert each row to represent 1 second.
startandstoprepresent the time (in seconds) that an event occurs. I want to explode this so that that I get 1 row per second. When converting floats to int, I want to round dow. - I want to create 2 new columsn,
y1andy2, which come from a cross between thespeaker_labelandycolumns. Ifspeaker_label1 is5aty, then columny1for that row is 5. - If there are rows of seconds that do not fall within a
startstoprange and therefor have nospeaker_labelorydata, then I want the values to be NaN.
It should look like this:
time y1 y2
309 NaN 5
310 NaN 5
311 NaN 5
312 NaN 5
313 4 NaN
314 NaN 4
315 NaN 4
316 5 NaN
317 5 NaN
318 5 NaN
319 5 NaN
320 NaN NaN
321 NaN NaN
322 NaN 7
323 NaN 7
If the speaker_label value changes (1 for 311.01 and 2 for 311.99), then the speaker_label value for 1 will go to y1 and the speaker_label value for 2 will go to y2. If the speaker_label value does not change in this circumstance, then assign the y value at 311.01 to 311 and not consider the 311.99 y value. I added this circumstance to the OP.
speaker_labelvalue changes (1 for 311.01 and 2 for 311.99), then thespeaker_labelvalue for 1 will go toy1and the speaker_label value for 2 will go toy2. If thespeaker_labelvalue does not change in this circumstance, then assign theyvalue at 311.01 to 311 and not consider the 311.99yvalue. I added this circumstance to the OP.y2=NaNfor 310/311/315 in the expected output? Is that related to the last paragraph?