I'm currently experiencing some unexpected behaviour in numpy. I am trying to add a column to a DataFrame which does some math on two other columns. These columns also contain a few strings of 'N/A'.
import pandas as pd
import numpy as np
my_list = []
my_list.append({'Value A':1, 'Value B':2})
my_list.append({'Value A':6, 'Value B':4})
my_list.append({'Value A':7, 'Value B':5})
my_list.append({'Value A':'N/A', 'Value B':6})
my_list.append({'Value A':12, 'Value B':10})
my_list.append({'Value A':2, 'Value B':2})
my_list.append({'Value A':9, 'Value B':'N/A'})
my_list.append({'Value A':8, 'Value B':3})
my_list.append({'Value A':22, 'Value B':6})
my_df = pd.DataFrame(my_list)
I then try to do a np.where() statement on this. First I check that, before I do any math, that both values are not 'N/A' because I convert them to floats if the condition is met:
my_df['New'] = np.where((my_df['Value A'].str != 'N/A') &
(my_df['Value B'].str != 'N/A'),
my_df['Value A'].astype(float) - my_df['Value B'].astype(float),
'N/A')
However when this is ran, I get an error on the numpy.where:
ValueError: could not convert string to float: N/A
I was under the impression that the conversion should not have even taken place, given that the condition should have failed when one of the values were 'N/A'.
Could anyone share any insight?
np.whereis a regular Python function. Python evaluates all its arguments before passing them to the function. That meansmy_df['Value A'].astype(float) - my_df['Value B'].astype(float)is evaluated, and the result of that evaluation is passed as the second argument ofnp.where. You'll have to modify your approach if don't want to evaluate that expression when one of the values is the string'N/A'.