If column contains substring from list, create new column with removed substring from list

Question

I'm trying to create a simplified name column. I have a brand name column and a list of strings as shown below. If the brand name column contains any string from list, then create a simplified brand name column with the string matched removed. The other brand name column elements that do not contain any strings from list will be carried over to the simplified column

l = ['co', 'ltd', 'company']

df:

Brand
Nike
Adidas co
Apple company
Intel
Google ltd
Walmart co
Burger King

Desired df:

Brand                Simplified
Nike                   Nike
Adidas co             Adidas
Apple company          Apple
Intel                  Intel
Google Ltd             Google
Walmart co            Walmart
Burger King          Burger King

Thanks in advance! Any help is appreciated!!

Have you tried some initial program/code to solve the problem ? What problems are you running into ? — Prashanth Mariswamy
– Prashanth Mariswamy, Commented Jul 29, 2020 at 3:02
The solution seems like a using a "re" module to remove the required strings. — Prashanth Mariswamy
– Prashanth Mariswamy, Commented Jul 29, 2020 at 3:04

AdibP · Accepted Answer · 2020-07-29 03:20:51Z

1

how about use this to remove substrings and trailing whitespaces

list_substring = ['ltd', 'company', 'co'] # 'company' will be evaluated first before 'co'
df['Simplified'] = df['Brand'].str.replace('|'.join(list_substring), '').str.lstrip()

edited Jul 29, 2020 at 3:20

answered Jul 29, 2020 at 3:07

AdibP

2,9691 gold badge13 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

AdibP Over a year ago

@bigbounty yes, I think the order of the list matters for my answer so company should be placed before co

AdibP Over a year ago

@MrNobody33 to remove the whitespace before the substring

bigbounty · Accepted Answer · 2020-07-29 03:17:29Z

0

In [28]: df
Out[28]:
           Brand
0           Nike
1      Adidas co
2  Apple company
3          Intel
4     Google ltd
5     Walmart co
6    Burger King

In [30]: df["Simplified"] = df.Brand.apply(lambda x: x.split()[0] if x.split()[-1] in l else x)

In [31]: df
Out[31]:
           Brand   Simplified
0           Nike         Nike
1      Adidas co       Adidas
2  Apple company        Apple
3          Intel        Intel
4     Google ltd       Google
5     Walmart co      Walmart
6    Burger King  Burger King

edited Jul 29, 2020 at 3:17

answered Jul 29, 2020 at 3:11

bigbounty

17.5k7 gold badges46 silver badges76 bronze badges

Comments

Rakesh · Accepted Answer · 2020-07-29 04:28:27Z

0

Using str.replace

Ex:

l = ['co', 'ltd', 'company']
df = pd.DataFrame({'Brand': ['Nike', 'Adidas co', 'Apple company', 'Intel', 'Google ltd', 'Walmart co', 'Burger King']})
df['Simplified'] = df['Brand'].str.replace(r"\b(" + "|".join(l) + r")\b", "").str.strip()
#or df['Brand'].str.replace(r"\b(" + "|".join(l) + r")\b$", "").str.strip()  #TO remove only in END of string
print(df)

Output:

           Brand   Simplified
0           Nike         Nike
1      Adidas co       Adidas
2  Apple company        Apple
3          Intel        Intel
4     Google ltd       Google
5     Walmart co      Walmart
6    Burger King  Burger King

answered Jul 29, 2020 at 4:28

Rakesh

82.9k17 gold badges86 silver badges122 bronze badges

Comments

Sam S. · Accepted Answer · 2020-07-29 04:45:05Z

0

df = {"Brand":["Nike","Adidas co","Apple company","Google ltd","Berger King"]}
df = pd.DataFrame(df)

list_items = ['ltd', 'company', 'co'] # 'company' will be evaluated first before 'co'
df['Simplified'] = [' '.join(w) for w in df['Brand'].str.split().apply(lambda x: [i for i in x if i not in list_items])]

answered Jul 29, 2020 at 4:45

Sam S.

8632 gold badges11 silver badges31 bronze badges

Collectives™ on Stack Overflow

If column contains substring from list, create new column with removed substring from list

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related