1

I have a DataFrame object, with multiple columns: business_id, categories, type_of_business...

I have managed to create a smaller DataFrame with only business_id and categories by column indexing on the original DataFrame object.

categories is a list of certain strings.Example: ['Restaurant, 'food', 'bakery'] - for each business_id.

One of the categories is Restaurants. How would I retrieve only those business ids where the word Restaurants is in the categories list.

Pseudocode:

for row in smaller_DataFrame:
    if 'Restaurants' in row['categories']:
        add this business_id to some dictionary.

I am interested in how I would incorporate the if condition in a DataFrame object.

Thanks in advance.

2 Answers 2

4

Selecting rows according to a boolean condition is called masking in the documentation.

df[df['categories'].isin(['Restaurant', 'food', 'bakery'])]

As an aside, I see you've been downvoted. It's better if you include a few sample rows of your DataFrame and an example of your desired result.

To make it case insensitive, stick .str.lowercase() before .isin, and make the list of categories all lowercase.

Sign up to request clarification or add additional context in comments.

3 Comments

I think that his categories column contain lists, not single values.
Yup, whatever it is he will have an answer :)
no the categories is a list of string as in each row contains a list of string categories. I think @DanAllan misunderstood me? Sorry for the confusion. I will be more descriptive in the future.
2

You can do it with map:

df[df.categories.map(lambda cats: 'Restaurants' in cats)]

1 Comment

I had used this lambda expression form before but it completely slipped my mind. Thanks for your help Viktor.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.