-1

I have a list of 1000 corporate companies and a df of all previous transactions for the year. For every match, I would like to create a new row value (True) in the new column (df$Covered).

I am not sure why I keep getting the errors below. I tried researching these questions but no luck so far.

Match string to list of defined strings

Pandas extract rows from df where df['col'] values match df2['col'] values

Code Example: when I set regex=False

Customer_List = ['3M','Cargill,'Chili's,---]

df['Covered'] = df[df['End Customer Name'].str.contains('|'.join(Customer_List),case=False, na=False, regex=False)]

ValueError: Wrong number of items passed 32, placement implies 1

Code Example: when I set regex=True

error: bad character range H-D at position 177825

 ~/opt/anaconda3/lib/python3.7/sre_parse.py in parse(str, flags, pattern)
    928 
    929     try:
--> 930         p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
    931     except Verbose:
    932         **# the VERBOSE flag was switched on inside the pattern.  to be**

~/opt/anaconda3/lib/python3.7/sre_parse.py in _parse_sub(source, state, verbose, nested)
    424     while True:
    425         itemsappend(_parse(source, state, verbose, nested + 1,
--> 426                            **not nested and not items**))
    427         if not sourcematch("|"):
    428             break
8
  • are you able to add some sample data? Commented Feb 24, 2020 at 17:18
  • possible to post the O/P of df.sample().to_dict() - that will help to recreate/test the problem. Commented Feb 24, 2020 at 17:23
  • df['End Customer Name'] are 100k+ rows of names while Customer_List is a list of 1000 company names, does that help? Commented Feb 24, 2020 at 17:24
  • 2
    Why are saying 'regex=False'? You are creating a regular expression by joining your terms with the 'bar' symbol meaning OR in regex. Commented Feb 24, 2020 at 17:24
  • Thanks Scott, I didn't know if I needed a literal string or Regex. Do you think it has to do with having a special character? Commented Feb 24, 2020 at 17:32

2 Answers 2

0

How about:

mask = df['End Customer Name'].isin(Customer_List)
df['covered'] = 0
df.loc[mask, 'covered'] = 1
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks TaxpayersMoney, but there are many rows in which the Customer_List is a substring in the 'End Customer Name' string, which is why I was using contains. Example: End Customer Name -Apple Inc, Apple Incorporation, Apple Inc. Customer List ["Apple Inc"]
0

Thanks everyone, it has to do with my Customer_List having special characters so I needed to use map(re.escape

This link helped me below Python regex bad character range.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.