pandas: filter out column values containing empty list

Question

I have the following data frame my_df:

col_A    col_B
---------------
John     []
Mary     ['A','B','C']
Ann      ['B','C']

I want to delete the rows where col_B has an empty list. i.e. I want the new data frame to be:

col_A    col_B
---------------
Mary     ['A','B','C']
Ann      ['B','C']

Below is what I did:

my_df[ len(my_df['col_B']) >0 ]

But I got the following errors:

KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.4/dist-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2133             try:
-> 2134                 return self._engine.get_loc(key)
   2135             except KeyError:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4164)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)()

KeyError: True

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-27-75da0b0af6a1> in <module>()
----> 1 records_df_pair_count[ len(records_df_pair_count['stable_seq']) >0 ]

/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in __getitem__(self, key)
   2057             return self._getitem_multilevel(key)
   2058         else:
-> 2059             return self._getitem_column(key)
   2060 
   2061     def _getitem_column(self, key):

/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in _getitem_column(self, key)
   2064         # get column
   2065         if self.columns.is_unique:
-> 2066             return self._get_item_cache(key)
   2067 
   2068         # duplicate columns & possible reduce dimensionality

/usr/local/lib/python3.4/dist-packages/pandas/core/generic.py in _get_item_cache(self, item)
   1384         res = cache.get(item)
   1385         if res is None:
-> 1386             values = self._data.get(item)
   1387             res = self._box_item_values(item, values)
   1388             cache[item] = res

/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in get(self, item, fastpath)
   3539 
   3540             if not isnull(item):
-> 3541                 loc = self.items.get_loc(item)
   3542             else:
   3543                 indexer = np.arange(len(self.items))[isnull(self.items)]

/usr/local/lib/python3.4/dist-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2134                 return self._engine.get_loc(key)
   2135             except KeyError:
-> 2136                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2137 
   2138         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4164)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)()

KeyError: True

Any idea what I did wrong here? Thanks!

wjandrea · Accepted Answer · 2024-07-11 16:45:03Z

15

Another way to do this:

my_df[my_df['col_b'].apply(len) > 0]

edited Jul 11, 2024 at 16:45

wjandrea

34k10 gold badges69 silver badges106 bronze badges

answered Mar 22, 2017 at 23:56

Andrew L

7,1083 gold badges28 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

wjandrea Over a year ago

lambda x: len(x) can be simplified to just len. This is called eta-reduction.

wjandrea Over a year ago

This will fail on NaNs IIRC. Use MaxU's answer instead.

MaxU - stand with Ukraine · Accepted Answer · 2017-03-22 23:49:33Z

12

You can use Series.str.len() method:

my_df[my_df['col_B'].str.len() > 0]

answered Mar 22, 2017 at 23:49

MaxU - stand with Ukraine

212k37 gold badges402 silver badges437 bronze badges

1 Comment

wjandrea Over a year ago

@Sherman A list is not a string. Maybe the .str is confusing you; it's not technically limited to strings; it can work on any object-dtype column.

B. Shieh · Accepted Answer · 2017-03-23 01:04:12Z

3

You already got a couple answers that correct the problem. But I thought I'd chime in with an explanation of why yours doesn't work.

This gives a pandas series:

my_df['col_B']

So this gives the length of the series:

len(my_df['col_B'])

Since you have a non-empty series, this evaluates to True:

len(my_df['col_B']) >0

And this:

my_df[ len(my_df['col_B']) >0 ]

evaluates to:

my_df[True]

And clearly my_df is not going to have True as a column index. Hence the KeyError.

answered Mar 23, 2017 at 1:04

B. Shieh

3212 silver badges5 bronze badges

Comments

wjandrea · Accepted Answer · 2024-07-11 16:42:50Z

0

Andrew's great answer can be further simplified:

df[df.col.apply(len) > 0]

edited Jul 11, 2024 at 16:42

wjandrea

34k10 gold badges69 silver badges106 bronze badges

answered Aug 16, 2023 at 13:12

dimid

7,7293 gold badges56 silver badges94 bronze badges

1 Comment

wjandrea Over a year ago

Yup, this is called eta-reduction. There's no change in functionality so I went ahead and edited Andrew's answer.

Collectives™ on Stack Overflow

pandas: filter out column values containing empty list

4 Answers 4

2 Comments

1 Comment

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related