I get some surprising results when trying to evaluate
logical expressions on data that might contain nan values (as defined in numpy).
I would like to understand why this results arise and how to implement the correct way.
What I don't understand is why these expressions evaluate to the value they do:
from numpy import nan
nan and True
>>> True
# this is wrong.. I would expect to evaluate to nan
True and nan
>>> nan
# OK
nan and False
>>> False
# OK regardless the value of the first element
# the expression should evaluate to False
False and nan
>>> False
#ok
Similarly for or:
True or nan
>>> True #OK
nan or True
>>> nan #wrong the expression is True
False or nan
>>> nan #OK
nan or False
>>> nan #OK
How can I implement (in an efficient way) the correct boolean functions, handling also nan values?
numpycurrently works.NaNis a purely floating-point value. Boolean arrays can't holdNaNs. Therefore, having a logical comparison returnNaNwould break essentially everything. To get around that, a specialnp.na(different fromnp.nan) value was introduced, and has been temporarily removed. It does what you're wanting: github.com/numpy/numpy.org/blob/master/NA-overview.rstdf['value'].shift(-1).fillna(100)<0