Pandas: query empty list

Question

Based on this question, I succeed in using the query method:

import pandas as pd
df = pd.DataFrame({"genre":[['comedy', 'sci-fi'], ['action', 'romance', 'comedy'], ['documentary'], ['crime','horror'], []]})
df.query("(genre.str.contains('comedy', na=False, regex=False))", engine="python")

Now I'd like to have a query returning rows with empty list for genre. I try

df.query("~(genre.str.contains('\\w*', na=False, regex=True))", engine="python")

and many other variations, without any success…

In the meantime I have successfully tried df.query("genre.astype('str') == '[]'") but I find it an ugly hack, converting to string… — NBur
– NBur, Commented Jul 11, 2024 at 15:27
Related: How to check if an element is an empty list in pandas?, filter out column values containing empty list — wjandrea
– wjandrea, Commented Jul 11, 2024 at 16:40

wjandrea · Accepted Answer · 2024-07-11 16:55:08Z

3

You can use ==, but for a list, the operation gets converted to .isin, so double up the brackets:

>>> df.query('genre == [[]]')
  genre
4    []

answered Jul 11, 2024 at 16:55

wjandrea

34k10 gold badges69 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mozway Over a year ago

Indeed, equivalent to df[df['genre'].isin([[]])].

wjandrea · Accepted Answer · 2024-07-11 16:39:39Z

3

You can check the .str.len():

df.query('genre.str.len() == 0')

Output:

  genre
4    []

edited Jul 11, 2024 at 16:39

wjandrea

34k10 gold badges69 silver badges106 bronze badges

answered Jul 11, 2024 at 16:34

PaulS

27.1k3 gold badges19 silver badges40 bronze badges

Comments

Triky · Accepted Answer · 2024-07-11 16:35:20Z

2

You can use pandas explode, check which row is na and after that use groupby any to recreate the original index.

m = df['genre'].explode().isna().groupby(level=0).any()
print(df[m])

Another example using your condition:

m = (df['genre'].explode()
     .str.contains('\\w*', na=False, regex=True)
     .groupby(level=0).any()
    )
print(df[~m])

Result:

genre
4   []

edited Jul 11, 2024 at 16:35

answered Jul 11, 2024 at 15:44

Triky

7641 gold badge4 silver badges5 bronze badges

2 Comments

NBur Over a year ago

Thanks, but as mentionned I'd like a query, to play with various filters

Triky Over a year ago

You can filter using the same code. I will edit to use the same condition

wjandrea · Accepted Answer · 2024-07-11 16:37:27Z

2

IIUC, you want to return rows where the list in the genre column are empty. You can do:

print(df)

                       genre
0           [comedy, sci-fi]
1  [action, romance, comedy]
2              [documentary]
3            [crime, horror]
4                         []

out = df.loc[df['genre'].str.len() == 0]
print(out)

  genre
4    []

edited Jul 11, 2024 at 16:37

wjandrea

34k10 gold badges69 silver badges106 bronze badges

answered Jul 11, 2024 at 15:19

iBeMeltin

2,2371 gold badge5 silver badges20 bronze badges

6 Comments

DeepSpace Over a year ago

What kind of sorcery is this? Is there any documentation why str.len() (using the .str accessor) would treat the series' elements as lists as opposed to strings?

DeepSpace Over a year ago

BTW, you can skip .loc: print(df[df.genre.str.len() == 0])

DeepSpace Over a year ago

After a second thought, it makes sense if pandas blindly calls len on each object, obviously both str and list implement __len__. If this is the case then this answer should be taken with a grain of salt as it may or may not break in a future version of pandas

wjandrea Over a year ago

@DeepSpace See String methods: "... methods that make it easy to operate on each element of the array." Notice how it doesn't say "each string". Then later: "Warning: Generally speaking, the .str accessor is intended to work only on strings. With very few exceptions, other uses are not supported, and may be disabled at a later point."

NBur Over a year ago

Thanks, but as mentionned I'd like a query, to play with various filters

|

e-motta · Accepted Answer · 2024-07-11 16:46:06Z

2

You can use apply to filter the dataframe:

# option 1
df[df["genre"].apply(len) == 0]

# option 2
df[df["genre"].apply(lambda x: len(x) == 0)]

You can do something similar using query, by referencing the built-in function len with a local variable and @:

len = len
df.query("genre.apply(@len) == 0")

All print:

  genre
4    []

edited Jul 11, 2024 at 16:46

answered Jul 11, 2024 at 16:02

e-motta

7,5953 gold badges10 silver badges32 bronze badges

2 Comments

NBur Over a year ago

Thanks, but as mentionned I'd like a query, to play with various filters

e-motta Over a year ago

@NBur I updated the answer with a possible solution using query.

Collectives™ on Stack Overflow

Pandas: query empty list

5 Answers 5

1 Comment

Comments

2 Comments

6 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

2 Comments

6 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related