12

I would like to find matching strings in a path and use np.select to create a new column with labels dependant on the matches I found.

This is what I have written

import numpy as np
conditions  = [a["properties_path"].str.contains('blog'),
               a["properties_path"].str.contains('credit-card-readers/|machines|poss|team|transaction_fees'),
               a["properties_path"].str.contains('signup|sign-up|create-account|continue|checkout'),
               a["properties_path"].str.contains('complete'),
               a["properties_path"] == '/za/|/',
              a["properties_path"].str.contains('promo')]
choices     = [ "blog","info_pages","signup","completed","home_page","promo"]
a["page_type"] = np.select(conditions, choices, default=np.nan)

However, when I run this code, I get this error message:

ValueError: invalid entry 0 in condlist: should be boolean ndarray

Here is a sample of my data

3124465                                       /blog/ts-st...
3124466                                       /card-machines
3124467                                       /card-machines
3124468                                       /card-machines
3124469                               /promo/our-gift-to-you
3124470                                   /create-account/v1
3124471                                          /za/signup/
3124472                                   /create-account/v1
3124473                                             /sign-up
3124474                                                 /za/
3124475                                        /sign-up/cart
3124476                                           /checkout/
3124477                                            /complete
3124478                                       /card-machines
3124479                                       /continue
3124480                             /blog/article/get-car...
3124481                             /blog/article/get-car...
3124482                                          /za/signup/
3124483                                 /credit-card-readers
3124484                                          /signup
3124485                                 /credit-card-readers
3124486                                   /create-account/v1
3124487                                 /credit-card-readers
3124488                                   /point-of-sale-app
3124489                                   /create-account/v1
3124490                                   /point-of-sale-app
3124491                                 /credit-card-readers

2 Answers 2

14

The .str methods operate on object columns. It's possible to have non-string values in such columns, and as a result pandas returns NaN for these rows instead of False. np then complains because this is not a Boolean.

Luckily, there's an argument to handle this: na=False

a["properties_path"].str.contains('blog', na=False)

Alternatively, you could change your conditions to:

a["properties_path"].str.contains('blog') == True
#or
a["properties_path"].str.contains('blog').fillna(False)

Sample

import pandas as pd
import numpy as np

df = pd.DataFrame({'a': [1, 'foo', 'bar']})
conds = df.a.str.contains('f')
#0      NaN
#1     True
#2    False
#Name: a, dtype: object

np.select([conds], ['XX'])
#ValueError: invalid entry 0 in condlist: should be boolean ndarray

conds = df.a.str.contains('f', na=False)
#0    False
#1     True
#2    False
#Name: a, dtype: bool

np.select([conds], ['XX'])
#array(['0', 'XX', '0'], dtype='<U11')
Sign up to request clarification or add additional context in comments.

Comments

1

Your data seem to have nan, so conditions have nan, which breaks np.select. To fix this, you can do:

s = a["properties_path"].fillna('')

and replace a['properties_path'] in each condition with s.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.