Pandas - Replace substrings from a column if not numeric

Question

I have a list of suffixes I want to remove in a list, say suffixes = ['inc','co','ltd']. I want to remove these from a column in a Pandas dataframe, and I have been doing this: df['name'] = df['name'].str.replace('|'.join(suffixes), '').

This works, but I do NOT want to remove the suffice if what remains is numeric. For example, if the name is 123 inc, I don't want to strip the 'inc'. Is there a way to add this condition in the code?

So for example "Apple inc" turns into "Apple" but "123 inc" remains "123 inc"? — Celius Stingher
– Celius Stingher, Commented Jul 2, 2020 at 13:31

Rakesh · Accepted Answer · 2020-07-02 13:47:21Z

2

Using Regex --> negative lookbehind.

Ex:

suffixes = ['inc','co','ltd']

df = pd.DataFrame({"Col": ["Abc inc", "123 inc", "Abc co", "123 co"]})
df['Col_2'] = df['Col'].str.replace(r"(?<!\d) \b(" + '|'.join(suffixes) + r")\b", '', regex=True)
print(df)

Output:

       Col    Col_2
0  Abc inc      Abc
1  123 inc  123 inc
2   Abc co      Abc
3   123 co   123 co

answered Jul 2, 2020 at 13:47

Rakesh

82.9k17 gold badges86 silver badges122 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ibarrond · Accepted Answer · 2020-07-02 13:43:18Z

1

Try adding ^[^0-9]+ to the suffixes. It is a REGEX that literally means "at least one not numeric char before". The code would look like this:

non_numeric_regex = r"^[^0-9]+"
suffixes = ['inc','co','ltd']
regex_w_suffixes = [non_numeric_regex + suf for suf in suffixes]
df['name'] = df['name'].str.replace('|'.join(regex_w_suffixes ), '')

edited Jul 2, 2020 at 13:43

answered Jul 2, 2020 at 13:34

ibarrond

7,8716 gold badges32 silver badges52 bronze badges

3 Comments

Celius Stingher Over a year ago

I copy-pasted this and it still removes the text from the numbers

ibarrond Over a year ago

Can you share a small example of your data and expected result? otherwise it is hard to refine a regex

Celius Stingher Over a year ago

I simply used this to test it. df = pd.DataFrame({'name':['Apple inc','123 inc','ayylmao co','987 co']})

Collectives™ on Stack Overflow

Pandas - Replace substrings from a column if not numeric

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related