1

I have a list of suffixes I want to remove in a list, say suffixes = ['inc','co','ltd']. I want to remove these from a column in a Pandas dataframe, and I have been doing this: df['name'] = df['name'].str.replace('|'.join(suffixes), '').

This works, but I do NOT want to remove the suffice if what remains is numeric. For example, if the name is 123 inc, I don't want to strip the 'inc'. Is there a way to add this condition in the code?

2
  • So for example "Apple inc" turns into "Apple" but "123 inc" remains "123 inc"? Commented Jul 2, 2020 at 13:31
  • Yes, that's correct. Commented Jul 2, 2020 at 13:34

2 Answers 2

2

Using Regex --> negative lookbehind.

Ex:

suffixes = ['inc','co','ltd']

df = pd.DataFrame({"Col": ["Abc inc", "123 inc", "Abc co", "123 co"]})
df['Col_2'] = df['Col'].str.replace(r"(?<!\d) \b(" + '|'.join(suffixes) + r")\b", '', regex=True)
print(df)

Output:

       Col    Col_2
0  Abc inc      Abc
1  123 inc  123 inc
2   Abc co      Abc
3   123 co   123 co
Sign up to request clarification or add additional context in comments.

Comments

1

Try adding ^[^0-9]+ to the suffixes. It is a REGEX that literally means "at least one not numeric char before". The code would look like this:

non_numeric_regex = r"^[^0-9]+"
suffixes = ['inc','co','ltd']
regex_w_suffixes = [non_numeric_regex + suf for suf in suffixes]
df['name'] = df['name'].str.replace('|'.join(regex_w_suffixes ), '')

3 Comments

I copy-pasted this and it still removes the text from the numbers
Can you share a small example of your data and expected result? otherwise it is hard to refine a regex
I simply used this to test it. df = pd.DataFrame({'name':['Apple inc','123 inc','ayylmao co','987 co']})

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.