Want to create a function with def, but ValueError returned

Question

What I wanna do

I want to do RFM analytics for purchase data of a e-commerce site.

I processed the data into RFM format, so I want to rank every ID depending on the values of each column (Money, Recency and Frequency).

However, I got the error message as below.

 ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-e7bf5ddc856d> in <module>
     13         return 5
     14 
---> 15 rfm['money rank'] = rfm['money'].apply(money)
     16 rfm.head()

c:\users\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
   7766             kwds=kwds,
   7767         )
-> 7768         return op.get_result()
   7769 
   7770     def applymap(self, func, na_action: Optional[str] = None) -> DataFrame:

c:\users\lib\site-packages\pandas\core\apply.py in get_result(self)
    183             return self.apply_raw()
    184 
--> 185         return self.apply_standard()
    186 
    187     def apply_empty_result(self):

c:\users\lib\site-packages\pandas\core\apply.py in apply_standard(self)
    274 
    275     def apply_standard(self):
--> 276         results, res_index = self.apply_series_generator()
    277 
    278         # wrap results

c:\users\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
    288             for i, v in enumerate(series_gen):
    289                 # ignore SettingWithCopy here in case the user mutates
--> 290                 results[i] = self.f(v)
    291                 if isinstance(results[i], ABCSeries):
    292                     # If we have a view on v, we need to make a copy because

<ipython-input-15-e7bf5ddc856d> in money(a)
      1 def money(a):
----> 2     if a < 1000:
      3         return 0
      4     if (1000 <= a) & (a < 2000):
      5         return 1

c:\users\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1440     @final
   1441     def __nonzero__(self):
-> 1442         raise ValueError(
   1443             f"The truth value of a {type(self).__name__} is ambiguous. "
   1444             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Data

```
    money    recency    frequency
sum    <lambda>    len
ID            
100    2674    169 days    1
101    19760    98 days    3
103    2674    167 days    1
109    7904    56 days    3
11    2674    211 days    1

<class 'pandas.core.frame.DataFrame'>
Index: 290 entries, 100 to 99
Data columns (total 3 columns):
 #   Column            Non-Null Count  Dtype          
---  ------            --------------  -----          
 0   (money, sum)     290 non-null    int64          
 1   (recency, <lambda>)  290 non-null    timedelta64[ns]
 2   (freqency, len)   290 non-null    int64          
dtypes: int64(2), timedelta64[ns](1)
memory usage: 9.1+ KB
```

Code

```
def money(a):
    if a < 1000:
        return 0
    if (1000 <= a) & (a < 2000):
        return 1
    if (2000 <= a) & (a < 3000):
        return 2
    if (3000 <= a) & (a < 4000):
        return 3
    if (4000 <= a) & (a < 5000):
        return 4
    if a >= 5000:
        return 5

rfm['money rank'] = rfm['money'].apply(money)
```

I tried different types of (), but all of them never worked.

If you could help me out, I'd be so grateful. Thank you in advance!!!

jezrael · Accepted Answer · 2021-04-13 06:15:44Z

1

If working with scalars use and instead & with remove last level of MultiIndex by MultiIndex.droplevel.

So use:

def money(a):
    if a < 1000:
        return 0
    if (1000 <= a) and (a < 2000):
        return 1
    if (2000 <= a) and (a < 3000):
        return 2
    if (3000 <= a) and (a < 4000):
        return 3
    if (4000 <= a) and (a < 5000):
        return 4
    if a >= 5000:
        return 5

rfm.columns = rfm.columns.droplevel(-1)
rfm['money rank'] = rfm['money'].apply(money)

Another solution here is use cut:

rfm.columns = rfm.columns.droplevel(-1)

rfm['money rank'] = pd.cut(rfm['money'], 
                           bins=[-np.inf, 1000,2000,3000,4000,5000,np.inf], 
                           labels=[0,1,2,3,4,5],
                           right=False)

edited Apr 13, 2021 at 6:15

answered Apr 13, 2021 at 5:41

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Pablito Over a year ago

Thank you for the answer. I tried both methods, but for the first solution I got the same error (ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().). For the second one, I got 'ValueError: Input array must be 1 dimensional'.

jezrael Over a year ago

@Pablito - Add rfm.columns = rfm.columns.droplevel(-1) for remove MultiIndex,answer was edited.

T.M15 · Accepted Answer · 2021-04-13 05:58:19Z

0

You can write it as:

def money(a):
    if a < 1000:
        return 0
    if 1000 <= a < 2000:
        return 1
    if 2000 <= a < 3000:
        return 2
    if 3000 <= a < 4000:
        return 3
    if 4000 <= a < 5000:
        return 4
    if a >= 5000:
        return 5

In fact, a more short logic would be:

def money(a):
    return min(5, a//1000)

PS: Assuming money is NOT negative, above solution will works. However, in case if you're willing to pass a negative value, you can also write it as:

def money(a):
    return max(0, min(5, a//1000))

Also, since you are just passing money to .apply, you can use lambda function as:

rfm['money rank'] = rfm['money'].apply(lambda a: max(0, min(5, a//1000)))

Hope that helps!

edited Apr 13, 2021 at 5:58

answered Apr 13, 2021 at 5:45

T.M15

4264 silver badges15 bronze badges

2 Comments

Pablito Over a year ago

Thank you for the answer. I wrote the exactly the same code at first, so I got the same error 'ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().'. I tried the second one as well, but I got the same error 'ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().'.

T.M15 Over a year ago

Hmmm.. well I don't have the knowledge of the functions calls shown in the stack trace of your error, but looking at it, at line --> 290 results[i] = self.f(v) I think you are unknowingly passing something else to the function, NOT a number(money value). What I can suggest here is to look the function in the docs first and understand the parameters... maybe that would help (Because a similar problem recently occurred with me in Node.Js and I was not understanding the parameters passed)

Yefet · Accepted Answer · 2021-04-13 06:05:31Z

0

another solution and fast one is to make use of numpy searchsorted

import numpy as np

bins = np.array([1000 , 2000 , 3000 , 4000 , 5000])
rfm['money rank'] = bins.searchsorted(rfm['money'])

edited Apr 13, 2021 at 6:05

answered Apr 13, 2021 at 5:55

Yefet

2,1162 gold badges12 silver badges21 bronze badges

Collectives™ on Stack Overflow

Want to create a function with def, but ValueError returned

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related