I have a dataframe with 2 columns containing audio filenames and corresponding texts, looking like this:
data = {'Audio_Filename': ['3e2bd3d1-b9fc-095728a4d05b',
'8248bf61-a66d-81f33aa7212d',
'81051730-8a18-6bf476d919a4'],
'Text': ['On a trip to America, he saw people filling his noodles into paper cups.',
'When the young officers were told they were going to the front,',
'Yeah, unbelievable, I had not even thought of that.']}
df = pd.DataFrame(data, columns = ['Audio_Filename', 'Text'])
Now I want to add a string prefix (the speaker ID: sp1, sp2, sp3) with an underscore _ to all audio filename strings according to this pattern:
sp2_3e2bd3d1-b9fc-095728a4d05b.
My difficulty: The prefix/speaker ID is not fixed but varies depending on the audio filenames. Because of this, I have zipped the audio filenames and the speaker IDs and iterated over those and the audio filename rows via for-loops. This is my code:
zipped = list(zip(audio_filenames, speaker_ids))
for audio, speaker_id in zipped:
for index, row in df.iterrows():
audio_row = row['Audio_Filename']
if audio == audio_row:
df['Audio_Filename'] = f'{speaker_id}_' + audio_row
df.to_csv('/home/user/file.csv')
I also tried apply with lambda after the if statement:
df['Audio_Filename'] = df['Audio_Filename'].apply(lambda x: '{}_{}'.format(speaker_id, audio_row))
But nothing works so far.
Can anyone please give me a hint on how to do this?
The resulting dataframe should look like this:
Audio_Filename Text
sp2_3e2bd3d1-b9fc-095728a4d05b On a trip to America, he saw people filling hi...
sp1_8248bf61-a66d-81f33aa7212d When the young officers were told they were go...
sp3_81051730-8a18-6bf476d919a4 Yeah, unbelievable, I had not even thought of ...
(Of course, I have much more audio filenames and corresponding texts in the dataframe).
I appreciate any help, thank you!
sp2_,sp_1,sp_3,sp_2,sp_1....?