5

Suppose I have a dataframe, d which has a column containing Python arrays as the values.

>>> d = pd.DataFrame([['foo', ['bar']], ['biz', []]], columns=['a','b'])
>>> print d

     a      b
0  foo  [bar]
1  biz     []

Now, I want to filter out those rows which have empty arrays.

I have tried various versions, but no luck so far:

Trying to check it as a 'truthy' value:

>>> d[d['b']]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2682, in __getitem__
    return self._getitem_array(key)
  File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2726, in _getitem_array
    indexer = self.loc._convert_to_indexer(key, axis=1)
  File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/indexing.py", line 1314, in _convert_to_indexer
    indexer = check = labels.get_indexer(objarr)
  File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 3259, in get_indexer
    indexer = self._engine.get_indexer(target._ndarray_values)
  File "pandas/_libs/index.pyx", line 301, in pandas._libs.index.IndexEngine.get_indexer
  File "pandas/_libs/hashtable_class_helper.pxi", line 1544, in pandas._libs.hashtable.PyObjectHashTable.lookup
TypeError: unhashable type: 'list'

Trying an explicit length check. It seems len() is being applied to the series, not the value of the data.

>>> d[ len(d['b']) > 0 ]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2688, in __getitem__
    return self._getitem_column(key)
  File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2695, in _getitem_column
    return self._get_item_cache(key)
  File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache
    values = self._data.get(item)
  File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/internals.py", line 4115, in get
    loc = self.items.get_loc(item)
  File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: True

Comparing to empty array directly, just as we might compare to an empty string (which, by the way, does work, if we use strings rather than arrays).

>>> d[ d['b'] == [] ]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/ops.py", line 1283, in wrapper
    res = na_op(values, other)
  File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/ops.py", line 1143, in na_op
    result = _comp_method_OBJECT_ARRAY(op, x, y)
  File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/ops.py", line 1120, in _comp_method_OBJECT_ARRAY
    result = libops.vec_compare(x, y, op)
  File "pandas/_libs/ops.pyx", line 128, in pandas._libs.ops.vec_compare
ValueError: Arrays were different lengths: 2 vs 0

3 Answers 3

8

Use the string accessor, .str to check the length of list in pandas series:

d[d.b.str.len()>0]

Output:

     a      b
0  foo  [bar]
Sign up to request clarification or add additional context in comments.

Comments

5

Empty lists will evaluate to False using all. This will not work if you have other Falsey values in a row (unless you want to drop those rows as well).

d[d.all(1)]

    a      b
0  foo  [bar]

If you only want to filter using column b, you can use astype:

d[d.b.astype(bool)]

     a      b
0  foo  [bar]

Comments

0

Scott's answer is better, but just for others' knowledge, another option is to use a tuple rather than a list, and check against an empty tuple directly.

d[d['b'] != ()]

Which gives:

     a       b
0  foo  (bar,)

This doesn't work with lists; see the last error in the original question.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.