Python numpy.nan and logical functions: wrong results

Question

I get some surprising results when trying to evaluate logical expressions on data that might contain nan values (as defined in numpy).

I would like to understand why this results arise and how to implement the correct way.

What I don't understand is why these expressions evaluate to the value they do:

from numpy import nan

nan and True
>>> True
# this is wrong.. I would expect to evaluate to nan

True and nan
>>> nan
# OK

nan and False
>>> False
# OK regardless the value of the first element 
# the expression should evaluate to False

False and nan
>>> False
#ok

Similarly for or:

True or nan
>>> True #OK

nan or True
>>> nan #wrong the expression is True

False or nan
>>> nan #OK

nan or False
>>> nan #OK

How can I implement (in an efficient way) the correct boolean functions, handling also nan values?

On a side note, what you're wanting doesn't make much sense with the way numpy currently works. NaN is a purely floating-point value. Boolean arrays can't hold NaNs. Therefore, having a logical comparison return NaN would break essentially everything. To get around that, a special np.na (different from np.nan) value was introduced, and has been temporarily removed. It does what you're wanting: github.com/numpy/numpy.org/blob/master/NA-overview.rst — Joe Kington
– Joe Kington, Commented Jun 24, 2013 at 12:37
See Why do “Not a Number” values equal True when cast as boolean in Python/Numpy? — Janne Karila
– Janne Karila, Commented Jun 24, 2013 at 13:44
@JoeKington thanks for the comment. It is good to know, unfortunately in this case I have to use results from a third-party module that return nan values, so I don't have much choices. — lucacerone
– lucacerone, Commented Jun 24, 2013 at 14:20
This is just totally counterintuitive and leads to unexpected results... What a nuisance — citynorman
– citynorman, Commented Jan 27, 2017 at 22:48
Fwiw in my case I fugded it with df['value'].shift(-1).fillna(100)<0 — citynorman
– citynorman, Commented Jan 27, 2017 at 22:55

ev-br · Accepted Answer · 2013-06-24 12:49:59Z

5

You can use predicates from the numpy namespace:

>>> np.logical_and(True, np.nan), np.logical_and(False, np.nan)
(True, False)
>>> np.logical_and(np.nan, True), np.logical_and(np.nan, False)
(True, False)
>>>
>>> np.logical_or(True, np.nan), np.logical_or(False, np.nan)
(True, True)
>>> np.logical_or(np.nan, True), np.logical_or(np.nan, False)
(True, True)

EDIT: The built-in boolean operators are slightly different. From the docs : x and y is equivalent to if x is false, then x, else y. So, if the first argument evaluates to False, they return it (not its boolean equivalent, as it were). Therefore:

>>> (None and True) is None
True
>>> [] and True
[]
>>> [] and False
[]
>>>

etc

edited Jun 24, 2013 at 12:49

answered Jun 24, 2013 at 11:46

ev-br

26.3k9 gold badges68 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

ev-br Over a year ago

on what grounds do you expect it to?

lucacerone Over a year ago

because the "and" requires both values to be true.. if one of them is unknown you simply can't decide the value.. and the result is unknown as well..

ev-br Over a year ago

np.bool(np.nan) evaluates to True. From that point on, it's all consistent.

ev-br Over a year ago

if you want to have a type with three values, true, false and 'don't know', have a look at boost::tribool: boost.org/doc/libs/1_53_0/doc/html/tribool.html

lucacerone Over a year ago

boost are not cpp libraries? In any way it is quite easy.. an and can be redefined as a min function providing keys -1,0,1 for 0, nan and 1 respectively. using the same keys or is implemented as max.

|

Nakamura · Accepted Answer · 2013-06-24 11:07:25Z

0

While evaluating logical expressions containing and, we have to evaluate the expressions that are present on both sides of the and operator. Whereas for or operator, if the first expression is True, then there is no need to check for the correctness of the second expression

E.g., While evaluating the expression 2>2 and 3==3 , first we should check whether the first expression 2>2 is True or not. If this first expression is False, then there is no need to check the second expression because of the AND operator and the result of such an expression will be FALSE as the first expression is FALSE. Whereas if the expression has been 2==2 AND 3==3 , then since the first expression 2==2 is True, then we need not check the correctness of the second expression and since here the second expression is also True, we get TRUE as the output.

In nan and True, since nan is True and because of AND operator, python will now evaluate the second expression and returns the value of second expression. So, here you will get TRUE as output. Same logic when applied to True and nan, you can expect nan as the output.

In OR operator, it is sufficient enough if we look at the first expression, hence "True or nan will return True

answered Jun 24, 2013 at 11:07

Nakamura

1891 silver badge9 bronze badges

5 Comments

lucacerone Over a year ago

..this explains the results that I would expect to get.. e.g. nan or True should return True (if nan is treated as True), and not nan..

Nakamura Over a year ago

since nan is True, python will return nan itself (not True) e.g., "2 or True" will return 2 (since 2 is True) and likewise "0 or 3" will return 3 (since 0 is considered as False). "2 and 3" will return 3 . "2 and True" will return True

ev-br Over a year ago

both or and and short-circuit: docs.python.org/2/library/…

lucacerone Over a year ago

@Nakamura both nan == True and nan is True evaluate to False though.. so do nan == False and nan is False. nan is neither False nor True, that's why I think the behaviour is wrong.

Nakamura Over a year ago

First, the output of a python expression containing boolean operators like and,or need not be a boolean (True or False) and this is nicely mentioned in the link pointed by @Zhenya . E.g., the output of the expression [] or 2 will be 2 .Secondly, numpy.nan refers to "Not A Number" ( docs.scipy.org/doc/numpy/reference/generated/numpy.isnan.html) and thus it would not be equal to either boolean True or boolean False operator. From the first point, we can infer that the output of the python expression nan and True will be True where as the output of nan and 2 will be `2'

Collectives™ on Stack Overflow

Python numpy.nan and logical functions: wrong results

2 Answers 2

7 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related