0

Based on this question, I succeed in using the query method:

import pandas as pd
df = pd.DataFrame({"genre":[['comedy', 'sci-fi'], ['action', 'romance', 'comedy'], ['documentary'], ['crime','horror'], []]})
df.query("(genre.str.contains('comedy', na=False, regex=False))", engine="python")

Now I'd like to have a query returning rows with empty list for genre. I try

df.query("~(genre.str.contains('\\w*', na=False, regex=True))", engine="python")

and many other variations, without any success…

2

5 Answers 5

3

You can use ==, but for a list, the operation gets converted to .isin, so double up the brackets:

>>> df.query('genre == [[]]')
  genre
4    []
Sign up to request clarification or add additional context in comments.

1 Comment

Indeed, equivalent to df[df['genre'].isin([[]])].
3

You can check the .str.len():

df.query('genre.str.len() == 0')

Output:

  genre
4    []

Comments

2

You can use pandas explode, check which row is na and after that use groupby any to recreate the original index.

m = df['genre'].explode().isna().groupby(level=0).any()
print(df[m])

Another example using your condition:

m = (df['genre'].explode()
     .str.contains('\\w*', na=False, regex=True)
     .groupby(level=0).any()
    )
print(df[~m])

Result:

genre
4   []

2 Comments

Thanks, but as mentionned I'd like a query, to play with various filters
You can filter using the same code. I will edit to use the same condition
2

IIUC, you want to return rows where the list in the genre column are empty. You can do:

print(df)

                       genre
0           [comedy, sci-fi]
1  [action, romance, comedy]
2              [documentary]
3            [crime, horror]
4                         []

out = df.loc[df['genre'].str.len() == 0]
print(out)

  genre
4    []

6 Comments

What kind of sorcery is this? Is there any documentation why str.len() (using the .str accessor) would treat the series' elements as lists as opposed to strings?
BTW, you can skip .loc: print(df[df.genre.str.len() == 0])
After a second thought, it makes sense if pandas blindly calls len on each object, obviously both str and list implement __len__. If this is the case then this answer should be taken with a grain of salt as it may or may not break in a future version of pandas
@DeepSpace See String methods: "... methods that make it easy to operate on each element of the array." Notice how it doesn't say "each string". Then later: "Warning: Generally speaking, the .str accessor is intended to work only on strings. With very few exceptions, other uses are not supported, and may be disabled at a later point."
Thanks, but as mentionned I'd like a query, to play with various filters
|
2

You can use apply to filter the dataframe:

# option 1
df[df["genre"].apply(len) == 0]

# option 2
df[df["genre"].apply(lambda x: len(x) == 0)]

You can do something similar using query, by referencing the built-in function len with a local variable and @:

len = len
df.query("genre.apply(@len) == 0")

All print:

  genre
4    []

2 Comments

Thanks, but as mentionned I'd like a query, to play with various filters
@NBur I updated the answer with a possible solution using query.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.