11

I have the following data frame my_df:

col_A    col_B
---------------
John     []
Mary     ['A','B','C']
Ann      ['B','C']

I want to delete the rows where col_B has an empty list. i.e. I want the new data frame to be:

col_A    col_B
---------------
Mary     ['A','B','C']
Ann      ['B','C']

Below is what I did:

my_df[ len(my_df['col_B']) >0 ]

But I got the following errors:


KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.4/dist-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2133             try:
-> 2134                 return self._engine.get_loc(key)
   2135             except KeyError:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4164)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)()

KeyError: True

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-27-75da0b0af6a1> in <module>()
----> 1 records_df_pair_count[ len(records_df_pair_count['stable_seq']) >0 ]

/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in __getitem__(self, key)
   2057             return self._getitem_multilevel(key)
   2058         else:
-> 2059             return self._getitem_column(key)
   2060 
   2061     def _getitem_column(self, key):

/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in _getitem_column(self, key)
   2064         # get column
   2065         if self.columns.is_unique:
-> 2066             return self._get_item_cache(key)
   2067 
   2068         # duplicate columns & possible reduce dimensionality

/usr/local/lib/python3.4/dist-packages/pandas/core/generic.py in _get_item_cache(self, item)
   1384         res = cache.get(item)
   1385         if res is None:
-> 1386             values = self._data.get(item)
   1387             res = self._box_item_values(item, values)
   1388             cache[item] = res

/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in get(self, item, fastpath)
   3539 
   3540             if not isnull(item):
-> 3541                 loc = self.items.get_loc(item)
   3542             else:
   3543                 indexer = np.arange(len(self.items))[isnull(self.items)]

/usr/local/lib/python3.4/dist-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2134                 return self._engine.get_loc(key)
   2135             except KeyError:
-> 2136                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2137 
   2138         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4164)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)()

KeyError: True

Any idea what I did wrong here? Thanks!

4 Answers 4

15

Another way to do this:

my_df[my_df['col_b'].apply(len) > 0]
Sign up to request clarification or add additional context in comments.

2 Comments

lambda x: len(x) can be simplified to just len. This is called eta-reduction.
This will fail on NaNs IIRC. Use MaxU's answer instead.
12

You can use Series.str.len() method:

my_df[my_df['col_B'].str.len() > 0]

1 Comment

@Sherman A list is not a string. Maybe the .str is confusing you; it's not technically limited to strings; it can work on any object-dtype column.
3

You already got a couple answers that correct the problem. But I thought I'd chime in with an explanation of why yours doesn't work.

This gives a pandas series:

my_df['col_B']

So this gives the length of the series:

len(my_df['col_B'])

Since you have a non-empty series, this evaluates to True:

len(my_df['col_B']) >0

And this:

my_df[ len(my_df['col_B']) >0 ]

evaluates to:

my_df[True]

And clearly my_df is not going to have True as a column index. Hence the KeyError.

Comments

0

Andrew's great answer can be further simplified:

df[df.col.apply(len) > 0]

1 Comment

Yup, this is called eta-reduction. There's no change in functionality so I went ahead and edited Andrew's answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.