4

I have a dataframe as below:

import pandas as pd
import numpy as np
df = pd.DataFrame({'col1':['AA_L8_ZZ', 'AA_L08_YY', 'AA_L800_XX', 'AA_L0008_CC']})
df

    col1
0   AA_L8_ZZ
1   AA_L08_YY
2   AA_L800_XX
3   AA_L0008_CC

I want to remove all 0's after character 'L'. My expected output:

    col1
0   AA_L8_ZZ
1   AA_L8_YY
2   AA_L800_XX
3   AA_L8_CC

2 Answers 2

3
In [114]: import pandas as pd
     ...: import numpy as np
     ...: df = pd.DataFrame({'col1':['AA_L8_ZZ', 'AA_L08_YY', 'AA_L800_XX', 'AA_L0008_CC']})
     ...: df
Out[114]:
          col1
0     AA_L8_ZZ
1    AA_L08_YY
2   AA_L800_XX
3  AA_L0008_CC

In [115]: df.col1.str.replace("L([0]*)","L")
Out[115]:
0      AA_L8_ZZ
1      AA_L8_YY
2    AA_L800_XX
3      AA_L8_CC
Name: col1, dtype: object
Sign up to request clarification or add additional context in comments.

Comments

1

Pandas string replace suffices for this. The code below looks for any 0, preceded by L, and replaces the 0 with an empty string :

df.col1.str.replace(r"(?<=L)0+", "")

0      AA_L8_ZZ
1      AA_L8_YY
2    AA_L800_XX
3      AA_L8_CC

If you need more speed, you could go down into plain Python with list comprehension:

import re
df["cleaned"] = [re.sub(r"(?<=L)0+", "", entry) for entry in df.col1]
df
     col1       cleaned
0   AA_L8_ZZ    AA_L8_ZZ
1   AA_L08_YY   AA_L8_YY
2   AA_L800_XX  AA_L800_XX
3   AA_L0008_CC AA_L8_CC

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.