0

I'm currently experiencing some unexpected behaviour in numpy. I am trying to add a column to a DataFrame which does some math on two other columns. These columns also contain a few strings of 'N/A'.

import pandas as pd
import numpy as np

my_list = []
my_list.append({'Value A':1, 'Value B':2})
my_list.append({'Value A':6, 'Value B':4})
my_list.append({'Value A':7, 'Value B':5})
my_list.append({'Value A':'N/A', 'Value B':6})
my_list.append({'Value A':12, 'Value B':10})
my_list.append({'Value A':2, 'Value B':2})
my_list.append({'Value A':9, 'Value B':'N/A'})
my_list.append({'Value A':8, 'Value B':3})
my_list.append({'Value A':22, 'Value B':6})

my_df = pd.DataFrame(my_list)

I then try to do a np.where() statement on this. First I check that, before I do any math, that both values are not 'N/A' because I convert them to floats if the condition is met:

my_df['New'] = np.where((my_df['Value A'].str != 'N/A') & 
                        (my_df['Value B'].str != 'N/A'),
                        my_df['Value A'].astype(float) - my_df['Value B'].astype(float),
                        'N/A')

However when this is ran, I get an error on the numpy.where:

ValueError: could not convert string to float: N/A

I was under the impression that the conversion should not have even taken place, given that the condition should have failed when one of the values were 'N/A'.

Could anyone share any insight?

1
  • 1
    np.where is a regular Python function. Python evaluates all its arguments before passing them to the function. That means my_df['Value A'].astype(float) - my_df['Value B'].astype(float) is evaluated, and the result of that evaluation is passed as the second argument of np.where. You'll have to modify your approach if don't want to evaluate that expression when one of the values is the string 'N/A'. Commented May 8, 2019 at 2:38

1 Answer 1

2

All the arguments to Python functions, in general, are evaluated before the function is called. The behaviour you want would be present in a for loop, but that would be slow and ugly.

Instead, you should use pd.to_numeric:

converted = my_df[['Value A', 'Value B']].transform(pd.to_numeric, errors='coerce')
result = converted['Value A'] - converted['Value B']

print(result)

filled_result = result.fillna('N/A')

print(filled_result)

Output:

0    -1.0
1     2.0
2     2.0
3     NaN
4     2.0
5     0.0
6     NaN
7     5.0
8    16.0
dtype: float64
0     -1
1      2
2      2
3    N/A
4      2
5      0
6    N/A
7      5
8     16
dtype: object
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.