1

I have a dataframe with 2 columns containing audio filenames and corresponding texts, looking like this:

data = {'Audio_Filename': ['3e2bd3d1-b9fc-095728a4d05b', 
                           '8248bf61-a66d-81f33aa7212d', 
                           '81051730-8a18-6bf476d919a4'],
        'Text': ['On a trip to America, he saw people filling his noodles into paper cups.', 
                 'When the young officers were told they were going to the front,', 
                 'Yeah, unbelievable, I had not even thought of that.']}
df = pd.DataFrame(data, columns = ['Audio_Filename', 'Text'])

Now I want to add a string prefix (the speaker ID: sp1, sp2, sp3) with an underscore _ to all audio filename strings according to this pattern: sp2_3e2bd3d1-b9fc-095728a4d05b.

My difficulty: The prefix/speaker ID is not fixed but varies depending on the audio filenames. Because of this, I have zipped the audio filenames and the speaker IDs and iterated over those and the audio filename rows via for-loops. This is my code:

zipped = list(zip(audio_filenames, speaker_ids))

for audio, speaker_id in zipped:
    for index, row in df.iterrows():
        audio_row = row['Audio_Filename']
             if audio == audio_row:
                 df['Audio_Filename'] = f'{speaker_id}_' + audio_row
                 df.to_csv('/home/user/file.csv')

I also tried apply with lambda after the if statement:

df['Audio_Filename'] = df['Audio_Filename'].apply(lambda x: '{}_{}'.format(speaker_id, audio_row))

But nothing works so far.

Can anyone please give me a hint on how to do this?

The resulting dataframe should look like this:

Audio_Filename  Text
sp2_3e2bd3d1-b9fc-095728a4d05b  On a trip to America, he saw people filling hi...
sp1_8248bf61-a66d-81f33aa7212d  When the young officers were told they were go...
sp3_81051730-8a18-6bf476d919a4  Yeah, unbelievable, I had not even thought of ...

(Of course, I have much more audio filenames and corresponding texts in the dataframe).

I appreciate any help, thank you!

2
  • what is the criteria of assigning speaker Id are they going to add like sp2_,sp_1,sp_3,sp_2,sp_1....? Commented Jul 7, 2021 at 8:54
  • The criterion is that the audio file is spoken by a specific speaker, so the audio filename has to match a certain speaker ID. Commented Jul 7, 2021 at 8:58

1 Answer 1

1

If you have audio_filenames and speaker_ids list, you can use Series.map function. For example:

audio_filenames = [
    "3e2bd3d1-b9fc-095728a4d05b",
    "8248bf61-a66d-81f33aa7212d",
    "81051730-8a18-6bf476d919a4",
]
speaker_ids = ["sp2", "sp1", "sp3"]


mapper = {k: "{}_{}".format(v, k) for k, v in zip(audio_filenames, speaker_ids)}
df["Audio_Filename"] = df["Audio_Filename"].map(mapper)

print(df)

Prints:

                   Audio_Filename                                                                      Text
0  sp2_3e2bd3d1-b9fc-095728a4d05b  On a trip to America, he saw people filling his noodles into paper cups.
1  sp1_8248bf61-a66d-81f33aa7212d           When the young officers were told they were going to the front,
2  sp3_81051730-8a18-6bf476d919a4                       Yeah, unbelievable, I had not even thought of that.
Sign up to request clarification or add additional context in comments.

4 Comments

First, thank you kindly for your answer! The code is still running, I'll let you know asap if this worked for me :-)
Mhh, unfortunately, this gives me back NaNs for all 'Audio_Filename' values.
@MareikeP Can you please edit your question and put there sample of audio_filenames and speaker_ids lists?
Ah , now it works! The NaNs occurred due to a lack of attention on my side, I had to adjust something else. Thank you very much for your solution and help! :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.