0

I would like to use a lambda function on a whole dataframe column with a conditional to:

  • remove the 3rd character
  • replace the 4th & 3rd last characters with a single dash '-'
  • only when a value starts with 'AB'

Such that 'AB123456789' becomes 'AB2345-89'

df = pd.DataFrame({'key':['123456789','AB123456789','CD123456789','987654321'],'adj_key':['123456789','AB123456789','CD123456789','987654321']})
    
df['adj_key'] = df['adj_key'].apply(lambda x: (delete 3rd character & replace 4th & 3rd last character with a single dash) if (value begins with 'AB'))

result:

      key            adj_key   
0     123456789      123456789    
1     AB123456789    AB2345-89 
2     CD123456789    CD123456789
3     987654321      987654321

Cheers

2
  • Why must it be a lambda function? Have you tried anything at all, just a regular function? What exactly is giving you trouble? Commented Nov 11, 2020 at 8:59
  • Rather than creating your own functions, use the str accessor available in Pandas. It is just more convenient. Commented Nov 11, 2020 at 9:06

2 Answers 2

2

It only makes sense when all AB strings have length >= 7:

df['adj_key'] = df['key'].apply(lambda x: x[:2]+x[3:-4]+'-'+x[-2:] if x.startswith('A') else x)

Output:

           key      adj_key
0    123456789    123456789
1  AB123456789    AB2345-89
2  CD123456789  CD123456789
3    987654321    987654321
Sign up to request clarification or add additional context in comments.

Comments

1

You can definitely do it using a lambda function. However you can also slice the column value and concat it back to get what you want. With this approach, it picks up all the data and arranges based on the 3 condition you specified. Like the other responses, length of 7 or above gives you a better result.

Here's how I did it:

c = ['key','adj_key']
d = [['123456789','123456789'],    
     ['AB123456789','AB2345-89'],
     ['CD123456789','CD123456789'],
     ['987654321','987654321']]

import pandas as pd
df = pd.DataFrame(d,columns=c)
print (df)

df['adjkey'] = df['key']
df.loc[df['key'].str[:2] == 'AB','adjkey'] = df['key'].str[:2]+df['key'].str[3:-4]+'-'+df['key'].str[-2:]

print (df)

The output of this is:

Original dataframe:

           key      adj_key
0    123456789    123456789
1  AB123456789    AB2345-89
2  CD123456789  CD123456789
3    987654321    987654321

New dataframe:

           key      adj_key       adjkey
0    123456789    123456789    123456789
1  AB123456789    AB2345-89    AB2345-89
2  CD123456789  CD123456789  CD123456789
3    987654321    987654321    987654321

5 Comments

Actually, you do need to worry about the string length regardless the approach because the logic of the slicing. For example, how you'll get replace the last 3th and 4th characters if they don't exist? I've tested, and both approach works identically regardless the length of the strings (they provide an output), but such outputs don't make sense or correctly achieve the user goals (see for example the output for 'AB12').
Agree. Length does matter. Editted my comments to say length has to be 7 or greater to get correct format
Apply lambda is costlier and slower compared to string concat. I prefer to use apply as alternate option. However it does the trick
Sure, but not always, it depends of the task. For this one, I personally think lambda is a good option. I am quite curious about benchmarking, so I did df = pd.concat([df for _ in range(10**5)]).reset_index(drop=True), i.e., repeated the dataframe 100K times, resulting in 400k rows, then I applied both approaches for comparison. It took me ~120 ms using apply/lambda and ~440 ms for pd.str. Although the lambda approach was faster, both were fast, and by sure in many cases is far better using pd.str.
Thank You for doing the benchmark. I was going to try this tonight. I agree that calling the df a few times reduces the speed and also uses up more memory

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.