5

What I'm trying to do is:

options = ['abc', 'def']
df[any(df['a'].str.startswith(start) for start in options)]

I want to apply a filter so I only have entries that have values in the column 'a' starting with one of the given options.

the next code works, but I need it to work with several options of prefixes...

start = 'abc'
df[df['a'].str.startswith(start)]

The error message is

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Read Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() but haven't got understanding of how to do so.

1
  • Show us your data set,please! Commented Aug 29, 2018 at 19:14

3 Answers 3

6

You can pass a tuple of options to startswith

df = pd.DataFrame({'a': ['abcd', 'def5', 'xabc', '5abc1', '9def', 'defabcb']})
options = ['abc', 'def']
df[df.a.str.startswith(tuple(options))]

You get

    a
0   abcd
1   def5
5   defabcb
Sign up to request clarification or add additional context in comments.

2 Comments

sorry, something else didn't go well with your solution, so at the end I did it another way - that's why I unmarked it. I'll add my answer as well.
Found out the reason why I did something else and it's not the limitation of your solution, so I accept it. Thank you!
2

You can try this:

mask = np.array([df['a'].str.startswith(start) for start in options]).any(axis=1)

it creates a Series for each start option and applies any along corresponding rows.

You were getting the error because built-in expects a list of bools but as the error message suggests "The truth value of a multiple valued object is ambiguous", so you rather need to use an array-aware any.

4 Comments

Thanks for the explanation! But doesn't Series's any return one of the matching items instead of bool result?
Do you mean Series.any()? It returns True if any of Series' elements evaluates to True and False otherwise.
yeah, I got confused because functions have the same name and behavior is slightly different... though think of any([...]) as a function that gets any of the True values in the array, it's the same. Thanks!
Yes, exactly. What you rather needed in turn is applying any across multiple Series row-wise. Luckily, the simpler and more plausible solution exists with passing a tuple to startswith (suggested by Vaishali).
0

One more solution:

# extract all possible values for 'a' column
all_a_values = df['a'].unique()
# filter 'a' column values by my criteria
accepted_a_values = [x for x in all_a_values if any([str(x).startswith(prefix) for prefix in options])]
# apply filter
df = df[df['a'].isin(accepted_a_values))]

Took it from here: remove rows and ValueError Arrays were different lengths

The solution provided by @Vaishali is the most simple and logical, but I needed the accepted_a_values list to iterate trough as well. This was not mentioned in the question, so I mark her answer as correct.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.