2

I have a list of values that I would like to use to select rows in a dataframe. The trick is I would like to select any row where the list value is in the row. Example:

index    color    shape
 1       blue     star
 2       red      square
 3       yellow   circle

My list would be

list_vals = ['sq', 'blu']

I would like to select the rows

index    color   shape
1        blue    star
2        red     square
4
  • 1
    Could you explain how you selected index 3? It seems that row does not contain the required search terms in list_vals. Commented Apr 17, 2019 at 16:14
  • 3
    I think he meant index 1 in this case Commented Apr 17, 2019 at 16:16
  • Yes sorry, not paying enough attention. I am interested in selecting index 1 and 2. I have made edits to the original post. Commented Apr 17, 2019 at 16:25
  • @MorganGladden, no problem. Thanks for your edit! Commented Apr 17, 2019 at 16:28

4 Answers 4

3

Use DataFrame.stack to convert to a Series, then use Series.str.contains to find the strings your interested in - we'll use '|'.join to create a regex 'OR' patter combining all items from list_items.

For reference, this regex pattern looks like 'sq|blu' in this case.

Next, Series.unstack to get back to original shape and use DataFrame.any over axis 1 to create the boolean index we'll use to return the desired rows.

df[df.stack().str.contains('|'.join(list_vals)).unstack().any(1)]

[out]

   ndex color   shape
0     1  blue    star
1     2   red  square
Sign up to request clarification or add additional context in comments.

Comments

2

Or join the list with a pipe and check with str.contains() over the df:

df[df.apply(lambda x: x.str.contains('|'.join(list_vals))).any(axis=1)]

       color   shape
index              
1      blue    star
2       red  square

Comments

2
df[df['shape'].apply(lambda x: any(s in x[:len(s)] for s in list_vals))]

Output

  color   shape
1   red  square

Comments

2

Here is one approach

df_filtered = (
    df[(df['color'].str.contains(list_vals[0])) |
        (df['shape'].str.contains(list_vals[1]))
        ]
                )

print(df_filtered)
   index color   shape
0      1  blue    star
1      2   red  square

EDIT

Another approach is based on this SO post (which contains the full explanation of this method)

  • the only changes I made were (1) to join your search list into a single search string and (2) to return the DataFrame (row) index of the search (filtered) results (this is then used to slice the original DataFrame)
def find_subtext(df, txt):
    contains = df.stack().str.contains(txt).unstack()
    return contains[contains.any(1)].index
df_filtered = find_subtext(df, '|'.join(list_vals))

print(df.iloc[df_filtered, :])
   index color   shape
0      1  blue    star
1      2   red  square

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.