1

I am not sure what I am doing wrong here, I am simply trying to call a function with a if-then-else filter in it and apply to a dataframe.

In [7]:
df.dtypes

Out[7]:
Filler     float64
Spot       float64
Strike     float64
cp          object
mid        float64
vol        float64
usedvol    float64
dtype: object

In [8]:
df.head()

Out[8]:
          Filler  Spot  Strike cp  mid   vol  
    0       0.0   100      50  c   0.0   25.0   
    1       0.0   100      50  p   0.0   25.0   
    2       1.0   100      55  c   1.0   24.5  
    3       1.0   100      55  p   1.0   24.5   
    4       2.5   100      60  c   2.5   24.0 

I have the below function:

def badvert(df):
    if df['vol']>24.5:
        df['vol2'] = df['vol']*2
    else:
        df['vol2'] = df['vol']/2
    return(df['vol2'])

Which I call here:

df['vol2']=badvert(df)

Which generates this error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-bbf7a11a17c9> in <module>()
----> 1 df['vol2']=badvert(df)

<ipython-input-13-6132a4be33ca> in badvert(df)
      1 def badvert(df):
----> 2     if df['vol']>24.5:
      3         df['vol2'] = df['vol']*2
      4     else:
      5         df['vol2'] = df['vol']/2

C:\Users\camcompco\AppData\Roaming\Python\Python34\site-packages\pandas\core\generic.py in __nonzero__(self)
    712         raise ValueError("The truth value of a {0} is ambiguous. "
    713                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 714                          .format(self.__class__.__name__))
    715 
    716     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

My gut tells me that this is a simple "syntax" issue but I am at a loss. Any help would be greatly appreciated

2 Answers 2

3

You want to apply it to each row, this will do what you want using apply and a lambda:

df["vol2"] = df.apply(lambda row: row['vol'] * 2 if row['vol'] > 24.5 else row['vol'] / 2, axis=1)
print(df)

Which should output something like:

   Filler  Spot  Strike cp  mid   vol   vol2
0     0.0   100      50  c  0.0  25.0  50.00
1     0.0   100      50  p  0.0  25.0  50.00
2     1.0   100      55  c  1.0  24.5  12.25
3     1.0   100      55  p  1.0  24.0  12.00
4     2.5   100      60  c  2.5  24.0  12.00

Or using your own function:

def badvert(df):
    if df['vol']>24.5:
        df['vol2'] = df['vol']*2
    else:
        df['vol2'] = df['vol']/2
    return df['vol2']
df["vol2"] = df.apply(badvert,axis=1)

axis=0 applies the function to each column, axis=1 applies function to each row.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much! And +1 for the multiple solutions
3

df.apply has performance comparable to a Python for-loop. Sometimes using apply or a for-loop to compute row-by-row is unavoidable, but in this case a quicker alternative would be express the calculation as one done on whole columns.

Because of the way the underlying data is stored in a DataFrame, and since there are usually many more rows than columns, calculations done on whole columns is usually quicker than calculations done row-by-row:

df['vol2'] = np.where(df['vol']>24.5, df['vol']*2, df['vol']/2)

1 Comment

thanks for the added color . . . .I just checked, in my application, this is 5.3 times faster than my "def" solution and 2.8 times faster than the df,apply approach. Thank you

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.