Numpy.where evaluating as True when condition is False

Question

I'm currently experiencing some unexpected behaviour in numpy. I am trying to add a column to a DataFrame which does some math on two other columns. These columns also contain a few strings of 'N/A'.

import pandas as pd
import numpy as np

my_list = []
my_list.append({'Value A':1, 'Value B':2})
my_list.append({'Value A':6, 'Value B':4})
my_list.append({'Value A':7, 'Value B':5})
my_list.append({'Value A':'N/A', 'Value B':6})
my_list.append({'Value A':12, 'Value B':10})
my_list.append({'Value A':2, 'Value B':2})
my_list.append({'Value A':9, 'Value B':'N/A'})
my_list.append({'Value A':8, 'Value B':3})
my_list.append({'Value A':22, 'Value B':6})

my_df = pd.DataFrame(my_list)

I then try to do a np.where() statement on this. First I check that, before I do any math, that both values are not 'N/A' because I convert them to floats if the condition is met:

my_df['New'] = np.where((my_df['Value A'].str != 'N/A') & 
                        (my_df['Value B'].str != 'N/A'),
                        my_df['Value A'].astype(float) - my_df['Value B'].astype(float),
                        'N/A')

However when this is ran, I get an error on the numpy.where:

ValueError: could not convert string to float: N/A

I was under the impression that the conversion should not have even taken place, given that the condition should have failed when one of the values were 'N/A'.

Could anyone share any insight?

np.where is a regular Python function. Python evaluates all its arguments before passing them to the function. That means my_df['Value A'].astype(float) - my_df['Value B'].astype(float) is evaluated, and the result of that evaluation is passed as the second argument of np.where. You'll have to modify your approach if don't want to evaluate that expression when one of the values is the string 'N/A'. — Warren Weckesser
– Warren Weckesser, Commented May 8, 2019 at 2:38

gmds · Accepted Answer · 2019-05-08 02:45:56Z

2

All the arguments to Python functions, in general, are evaluated before the function is called. The behaviour you want would be present in a for loop, but that would be slow and ugly.

Instead, you should use pd.to_numeric:

converted = my_df[['Value A', 'Value B']].transform(pd.to_numeric, errors='coerce')
result = converted['Value A'] - converted['Value B']

print(result)

filled_result = result.fillna('N/A')

print(filled_result)

Output:

0    -1.0
1     2.0
2     2.0
3     NaN
4     2.0
5     0.0
6     NaN
7     5.0
8    16.0
dtype: float64
0     -1
1      2
2      2
3    N/A
4      2
5      0
6    N/A
7      5
8     16
dtype: object

answered May 8, 2019 at 2:45

gmds

20k4 gold badges37 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Numpy.where evaluating as True when condition is False

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related