2

I would like to extract/filter the rows of a dataframe that contains the strings on a list, in this case I am trying to use queries since they usually are fantastic for this job and very elegant in the code, I have tried:

my_list = ['red', 'blue', 'green', 'yellow']

df_new = df.query("`User Color` in @my_list")

I am looking for a function that works like in (if the string is contained)

My dataframe df looks kind of like this:

name      id    User Color    Age 
Luis      876   blue, green   35
Charles   12    blue, brown   34
Luna      654   black         24
Anna      987   brown         19
Silvana   31    red, black    26
Juliet    55    red           20

And the output I expect should be:

name      id    User Color    Age 
Luis      876   blue, green   35
Charles   12    blue, brown   34
Silvana   31    red, black    26
Juliet    55    red           20
3
  • 1
    df_subset = df[df['User Color'].map(lambda val: any(x in my_list for x in val.split(',')))] this should do the trick Commented Dec 3, 2020 at 20:14
  • 2
    ^^^ val.split(',') : ) Commented Dec 3, 2020 at 20:17
  • 1
    Thanks @DavidErickson for pointing that out. Commented Dec 3, 2020 at 20:19

3 Answers 3

2

You need to split the values in each row and check if any of those values are present in your selected list.

This can be done with a map function

df_subset = df[df['User Color'].map(lambda val: any(x in my_list for x in val.split(',')))]

Since it's a string match, so depending on your requirement consider striping and lowering the split values.

Similar code to above, but descriptive:

def filter_color(val):
  for x in val.split(','):
    if x.lower().strip() in my_list:
      return True
  return False

df_subset = df[df['name'].map(filter_color)]
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much! This is also a wonderful answer, even if it is not precisely using the query method. I tried it too and it works. I really appreciate it
1

Building off @DavidErickson's solution, using the query method::

df.query("`User Color`.str.contains('|'.join(@my_list))")

    name    id  User Color  Age
0   Luis    876 blue, green 35
1   Charles 12  blue, brown 34
4   Silvana 31  red, black  26
5   Juliet  55  red         20

Comments

1

Instead of splitting the dataframe column, you could do the inverse, which is joining the list. You could use join with str.contains. NOTE: this is not as robust as it will not give a direct match:

df[df['User Color'].str.contains('|'.join(my_list))]

Out[1]: 
      name   id   User Color  Age
0     Luis  876  blue, green   35
1  Charles   12  blue, brown   34
4  Silvana   31   red, black   26
5   Juliet   55          red   20

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.