How to add varying prefixes to strings in pandas column

Question

I have a dataframe with 2 columns containing audio filenames and corresponding texts, looking like this:

data = {'Audio_Filename': ['3e2bd3d1-b9fc-095728a4d05b', 
                           '8248bf61-a66d-81f33aa7212d', 
                           '81051730-8a18-6bf476d919a4'],
        'Text': ['On a trip to America, he saw people filling his noodles into paper cups.', 
                 'When the young officers were told they were going to the front,', 
                 'Yeah, unbelievable, I had not even thought of that.']}
df = pd.DataFrame(data, columns = ['Audio_Filename', 'Text'])

Now I want to add a string prefix (the speaker ID: sp1, sp2, sp3) with an underscore _ to all audio filename strings according to this pattern: sp2_3e2bd3d1-b9fc-095728a4d05b.

My difficulty: The prefix/speaker ID is not fixed but varies depending on the audio filenames. Because of this, I have zipped the audio filenames and the speaker IDs and iterated over those and the audio filename rows via for-loops. This is my code:

zipped = list(zip(audio_filenames, speaker_ids))

for audio, speaker_id in zipped:
    for index, row in df.iterrows():
        audio_row = row['Audio_Filename']
             if audio == audio_row:
                 df['Audio_Filename'] = f'{speaker_id}_' + audio_row
                 df.to_csv('/home/user/file.csv')

I also tried apply with lambda after the if statement:

df['Audio_Filename'] = df['Audio_Filename'].apply(lambda x: '{}_{}'.format(speaker_id, audio_row))

But nothing works so far.

Can anyone please give me a hint on how to do this?

The resulting dataframe should look like this:

Audio_Filename  Text
sp2_3e2bd3d1-b9fc-095728a4d05b  On a trip to America, he saw people filling hi...
sp1_8248bf61-a66d-81f33aa7212d  When the young officers were told they were go...
sp3_81051730-8a18-6bf476d919a4  Yeah, unbelievable, I had not even thought of ...

(Of course, I have much more audio filenames and corresponding texts in the dataframe).

I appreciate any help, thank you!

what is the criteria of assigning speaker Id are they going to add like sp2_,sp_1,sp_3,sp_2,sp_1....? — Anurag Dabas
– Anurag Dabas, Commented Jul 7, 2021 at 8:54
The criterion is that the audio file is spoken by a specific speaker, so the audio filename has to match a certain speaker ID. — MareikeP
– MareikeP, Commented Jul 7, 2021 at 8:58

Andrej Kesely · Accepted Answer · 2021-07-07 08:53:17Z

1

If you have audio_filenames and speaker_ids list, you can use Series.map function. For example:

audio_filenames = [
    "3e2bd3d1-b9fc-095728a4d05b",
    "8248bf61-a66d-81f33aa7212d",
    "81051730-8a18-6bf476d919a4",
]
speaker_ids = ["sp2", "sp1", "sp3"]


mapper = {k: "{}_{}".format(v, k) for k, v in zip(audio_filenames, speaker_ids)}
df["Audio_Filename"] = df["Audio_Filename"].map(mapper)

print(df)

Prints:

                   Audio_Filename                                                                      Text
0  sp2_3e2bd3d1-b9fc-095728a4d05b  On a trip to America, he saw people filling his noodles into paper cups.
1  sp1_8248bf61-a66d-81f33aa7212d           When the young officers were told they were going to the front,
2  sp3_81051730-8a18-6bf476d919a4                       Yeah, unbelievable, I had not even thought of that.

answered Jul 7, 2021 at 8:53

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

MareikeP Over a year ago

First, thank you kindly for your answer! The code is still running, I'll let you know asap if this worked for me :-)

MareikeP Over a year ago

Mhh, unfortunately, this gives me back NaNs for all 'Audio_Filename' values.

Andrej Kesely Over a year ago

@MareikeP Can you please edit your question and put there sample of audio_filenames and speaker_ids lists?

MareikeP Over a year ago

Ah , now it works! The NaNs occurred due to a lack of attention on my side, I had to adjust something else. Thank you very much for your solution and help! :-)

Collectives™ on Stack Overflow

How to add varying prefixes to strings in pandas column

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related