0

I am trying to create a new variable basis on certain condition. In the given example, i want to create a new condition such that if the recency is greater than the mean of recency and the frequency is greater than 3 times its standard deviation, i will assign 2 else 0. Below are the codes and its error:

import pandas as pd
import numpy as np
  
# intialise data of lists.
data = {'cust':["a1",   "a2",   "a3",   "a4",   "a5",   "a6",   "a7",   "a8",   "a9",   "a10",  "a11",  "a12",  "a13",  "a14",  "a15",  "a16",  "a17",  "a18",  "a19",  "a20",  "a21",  "a22",  "a23",  "a24",  "a25",  "a26",  "a27",  "a28",  "a29",  "a30",  "a31",  "a32",  "a33",  "a34",  "a35",  "a36",  "a37",  "a38",  "a39",  "a40",  "a41",  "a42",  "a43",  "a44",  "a45",  "a46",  "a47",  "a48",  "a49",  "a50",  "a51"],
        'recency':[3,   7,  9,  9,  6,  8,  3,  9,  6,  5,  8,  6,  2,  8,  3,  3,  2,  7,  3,  1,  7,  6,  10, 6,  2,  8,  6,  10, 2,  7,  9,  1,  1,  3,  6,  4,  6,  4,  6,  6,  7,  3,  7,  9,  6,  4,  7,  3,  1,  9,  3],
        'frequency':[15,    9,  13, 9,  19, 1,  11, 20, 20, 15, 15, 18, 1,  9,  20, 14, 11, 11, 4,  15, 1,  8,  17, 19, 13, 20, 1,  11, 3,  8,  2,  4,  15, 5,  12, 15, 20, 6,  19, 2,  6,  12, 6,  6,  4,  7,  2,  3,  20, 13, 11],
       'monetary':[8854,    5614,   2687,   3553,   1801,   1076,   9724,   7778,   8382,   4391,   6766,   9905,   3181,   4170,   7544,   2997,   3025,   9358,   6015,   9919,   5132,   3598,   8779,   4420,   8931,   1492,   5491,   4186,   4720,   2568,   2530,   4618,   4109,   9384,   3000,   9766,   9524,   1027,   6315,   9806,   3442,   7256,   2432,   2429,   7696,   4527,   1802,   6606,   3018,   6295,   2985]}
# Create DataFrame
df = pd.DataFrame(data)

df['cluster']=np.where(df['recency']>df['recency'].mean() & df['frequency']>df['frequency'].mean()+
                       df['frequency'].std() ,2,0)

TypeError: Cannot perform 'rand_' with a dtyped [int64] array and scalar of type [bool]
1
  • df['cluster']=np.where((df['recency']>df['recency'].mean()) & (df['frequency']>df['frequency'].mean()+ df['frequency'].std()) ,2,0)? Commented Apr 6, 2021 at 12:36

1 Answer 1

1

I think there missing () for conditions because priority of operators:

df['cluster'] = np.where((df['recency']>df['recency'].mean()) & 
                         (df['frequency']>df['frequency'].mean()+df['frequency'].std()),2,0)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.