2

I'm trying to create a simplified name column. I have a brand name column and a list of strings as shown below. If the brand name column contains any string from list, then create a simplified brand name column with the string matched removed. The other brand name column elements that do not contain any strings from list will be carried over to the simplified column

l = ['co', 'ltd', 'company']

df:

Brand
Nike
Adidas co
Apple company
Intel
Google ltd
Walmart co
Burger King

Desired df:

Brand                Simplified
Nike                   Nike
Adidas co             Adidas
Apple company          Apple
Intel                  Intel
Google Ltd             Google
Walmart co            Walmart
Burger King          Burger King

Thanks in advance! Any help is appreciated!!

2
  • Have you tried some initial program/code to solve the problem ? What problems are you running into ? Commented Jul 29, 2020 at 3:02
  • The solution seems like a using a "re" module to remove the required strings. Commented Jul 29, 2020 at 3:04

4 Answers 4

1

how about use this to remove substrings and trailing whitespaces

list_substring = ['ltd', 'company', 'co'] # 'company' will be evaluated first before 'co'
df['Simplified'] = df['Brand'].str.replace('|'.join(list_substring), '').str.lstrip()
Sign up to request clarification or add additional context in comments.

2 Comments

@bigbounty yes, I think the order of the list matters for my answer so company should be placed before co
@MrNobody33 to remove the whitespace before the substring
0
In [28]: df
Out[28]:
           Brand
0           Nike
1      Adidas co
2  Apple company
3          Intel
4     Google ltd
5     Walmart co
6    Burger King

In [30]: df["Simplified"] = df.Brand.apply(lambda x: x.split()[0] if x.split()[-1] in l else x)

In [31]: df
Out[31]:
           Brand   Simplified
0           Nike         Nike
1      Adidas co       Adidas
2  Apple company        Apple
3          Intel        Intel
4     Google ltd       Google
5     Walmart co      Walmart
6    Burger King  Burger King

Comments

0

Using str.replace

Ex:

l = ['co', 'ltd', 'company']
df = pd.DataFrame({'Brand': ['Nike', 'Adidas co', 'Apple company', 'Intel', 'Google ltd', 'Walmart co', 'Burger King']})
df['Simplified'] = df['Brand'].str.replace(r"\b(" + "|".join(l) + r")\b", "").str.strip()
#or df['Brand'].str.replace(r"\b(" + "|".join(l) + r")\b$", "").str.strip()  #TO remove only in END of string
print(df)

Output:

           Brand   Simplified
0           Nike         Nike
1      Adidas co       Adidas
2  Apple company        Apple
3          Intel        Intel
4     Google ltd       Google
5     Walmart co      Walmart
6    Burger King  Burger King

Comments

0
df = {"Brand":["Nike","Adidas co","Apple company","Google ltd","Berger King"]}
df = pd.DataFrame(df)

list_items = ['ltd', 'company', 'co'] # 'company' will be evaluated first before 'co'
df['Simplified'] = [' '.join(w) for w in df['Brand'].str.split().apply(lambda x: [i for i in x if i not in list_items])]

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.