How to find a string match in df col based on list of strings?

Question

I have a list of 1000 corporate companies and a df of all previous transactions for the year. For every match, I would like to create a new row value (True) in the new column (df$Covered).

I am not sure why I keep getting the errors below. I tried researching these questions but no luck so far.

Match string to list of defined strings

Pandas extract rows from df where df['col'] values match df2['col'] values

Code Example: when I set regex=False

Customer_List = ['3M','Cargill,'Chili's,---]

df['Covered'] = df[df['End Customer Name'].str.contains('|'.join(Customer_List),case=False, na=False, regex=False)]

ValueError: Wrong number of items passed 32, placement implies 1

Code Example: when I set regex=True

error: bad character range H-D at position 177825

 ~/opt/anaconda3/lib/python3.7/sre_parse.py in parse(str, flags, pattern)
    928 
    929     try:
--> 930         p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
    931     except Verbose:
    932         **# the VERBOSE flag was switched on inside the pattern.  to be**

~/opt/anaconda3/lib/python3.7/sre_parse.py in _parse_sub(source, state, verbose, nested)
    424     while True:
    425         itemsappend(_parse(source, state, verbose, nested + 1,
--> 426                            **not nested and not items**))
    427         if not sourcematch("|"):
    428             break

possible to post the O/P of df.sample().to_dict() - that will help to recreate/test the problem. — instinct246
– instinct246, Commented Feb 24, 2020 at 17:23
df['End Customer Name'] are 100k+ rows of names while Customer_List is a list of 1000 company names, does that help? — pandas
– pandas, Commented Feb 24, 2020 at 17:24
Why are saying 'regex=False'? You are creating a regular expression by joining your terms with the 'bar' symbol meaning OR in regex. — Scott Boston
– Scott Boston, Commented Feb 24, 2020 at 17:24
Thanks Scott, I didn't know if I needed a literal string or Regex. Do you think it has to do with having a special character? — pandas
– pandas, Commented Feb 24, 2020 at 17:32

TaxpayersMoney · Accepted Answer · 2020-02-24 17:43:38Z

0

How about:

mask = df['End Customer Name'].isin(Customer_List)
df['covered'] = 0
df.loc[mask, 'covered'] = 1

answered Feb 24, 2020 at 17:43

TaxpayersMoney

6892 gold badges8 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

pandas Over a year ago

Thanks TaxpayersMoney, but there are many rows in which the Customer_List is a substring in the 'End Customer Name' string, which is why I was using contains. Example: End Customer Name -Apple Inc, Apple Incorporation, Apple Inc. Customer List ["Apple Inc"]

pandas · Accepted Answer · 2020-02-25 01:20:15Z

0

Thanks everyone, it has to do with my Customer_List having special characters so I needed to use map(re.escape

This link helped me below Python regex bad character range.

answered Feb 25, 2020 at 1:20

pandas

214 bronze badges

Collectives™ on Stack Overflow

How to find a string match in df col based on list of strings?

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related