1

What I wanna do

I want to do RFM analytics for purchase data of a e-commerce site.

I processed the data into RFM format, so I want to rank every ID depending on the values of each column (Money, Recency and Frequency).

However, I got the error message as below.

 ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-e7bf5ddc856d> in <module>
     13         return 5
     14 
---> 15 rfm['money rank'] = rfm['money'].apply(money)
     16 rfm.head()

c:\users\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
   7766             kwds=kwds,
   7767         )
-> 7768         return op.get_result()
   7769 
   7770     def applymap(self, func, na_action: Optional[str] = None) -> DataFrame:

c:\users\lib\site-packages\pandas\core\apply.py in get_result(self)
    183             return self.apply_raw()
    184 
--> 185         return self.apply_standard()
    186 
    187     def apply_empty_result(self):

c:\users\lib\site-packages\pandas\core\apply.py in apply_standard(self)
    274 
    275     def apply_standard(self):
--> 276         results, res_index = self.apply_series_generator()
    277 
    278         # wrap results

c:\users\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
    288             for i, v in enumerate(series_gen):
    289                 # ignore SettingWithCopy here in case the user mutates
--> 290                 results[i] = self.f(v)
    291                 if isinstance(results[i], ABCSeries):
    292                     # If we have a view on v, we need to make a copy because

<ipython-input-15-e7bf5ddc856d> in money(a)
      1 def money(a):
----> 2     if a < 1000:
      3         return 0
      4     if (1000 <= a) & (a < 2000):
      5         return 1

c:\users\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1440     @final
   1441     def __nonzero__(self):
-> 1442         raise ValueError(
   1443             f"The truth value of a {type(self).__name__} is ambiguous. "
   1444             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). 

Data

```
    money    recency    frequency
sum    <lambda>    len
ID            
100    2674    169 days    1
101    19760    98 days    3
103    2674    167 days    1
109    7904    56 days    3
11    2674    211 days    1

<class 'pandas.core.frame.DataFrame'>
Index: 290 entries, 100 to 99
Data columns (total 3 columns):
 #   Column            Non-Null Count  Dtype          
---  ------            --------------  -----          
 0   (money, sum)     290 non-null    int64          
 1   (recency, <lambda>)  290 non-null    timedelta64[ns]
 2   (freqency, len)   290 non-null    int64          
dtypes: int64(2), timedelta64[ns](1)
memory usage: 9.1+ KB
```

Code

```
def money(a):
    if a < 1000:
        return 0
    if (1000 <= a) & (a < 2000):
        return 1
    if (2000 <= a) & (a < 3000):
        return 2
    if (3000 <= a) & (a < 4000):
        return 3
    if (4000 <= a) & (a < 5000):
        return 4
    if a >= 5000:
        return 5

rfm['money rank'] = rfm['money'].apply(money)
```

I tried different types of (), but all of them never worked.

If you could help me out, I'd be so grateful. Thank you in advance!!!

3 Answers 3

1

If working with scalars use and instead & with remove last level of MultiIndex by MultiIndex.droplevel.

So use:

def money(a):
    if a < 1000:
        return 0
    if (1000 <= a) and (a < 2000):
        return 1
    if (2000 <= a) and (a < 3000):
        return 2
    if (3000 <= a) and (a < 4000):
        return 3
    if (4000 <= a) and (a < 5000):
        return 4
    if a >= 5000:
        return 5

rfm.columns = rfm.columns.droplevel(-1)
rfm['money rank'] = rfm['money'].apply(money)

Another solution here is use cut:

rfm.columns = rfm.columns.droplevel(-1)

rfm['money rank'] = pd.cut(rfm['money'], 
                           bins=[-np.inf, 1000,2000,3000,4000,5000,np.inf], 
                           labels=[0,1,2,3,4,5],
                           right=False)
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for the answer. I tried both methods, but for the first solution I got the same error (ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().). For the second one, I got 'ValueError: Input array must be 1 dimensional'.
@Pablito - Add rfm.columns = rfm.columns.droplevel(-1) for remove MultiIndex,answer was edited.
0

You can write it as:

def money(a):
    if a < 1000:
        return 0
    if 1000 <= a < 2000:
        return 1
    if 2000 <= a < 3000:
        return 2
    if 3000 <= a < 4000:
        return 3
    if 4000 <= a < 5000:
        return 4
    if a >= 5000:
        return 5

In fact, a more short logic would be:

def money(a):
    return min(5, a//1000) 

PS: Assuming money is NOT negative, above solution will works. However, in case if you're willing to pass a negative value, you can also write it as:

def money(a):
    return max(0, min(5, a//1000))

Also, since you are just passing money to .apply, you can use lambda function as:

rfm['money rank'] = rfm['money'].apply(lambda a: max(0, min(5, a//1000)))

Hope that helps!

2 Comments

Thank you for the answer. I wrote the exactly the same code at first, so I got the same error 'ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().'. I tried the second one as well, but I got the same error 'ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().'.
Hmmm.. well I don't have the knowledge of the functions calls shown in the stack trace of your error, but looking at it, at line --> 290 results[i] = self.f(v) I think you are unknowingly passing something else to the function, NOT a number(money value). What I can suggest here is to look the function in the docs first and understand the parameters... maybe that would help (Because a similar problem recently occurred with me in Node.Js and I was not understanding the parameters passed)
0

another solution and fast one is to make use of numpy searchsorted

import numpy as np

bins = np.array([1000 , 2000 , 3000 , 4000 , 5000])
rfm['money rank'] = bins.searchsorted(rfm['money']) 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.