0

I have an csv data set that I imported in Jupyter and stored under inp0. I'm trying to create price bucket for these using .loc function in pandas bet getting below error.

My Code:

inp0.loc[inp0.price==0.00, 'Price_Bucket'] = 'Free App'
inp0.loc[[inp0.price>0.00 and inp0.price<3.00],'Price_Bucket'] = 'Apps that cost <3'
inp0.loc[[inp0.price>=3.00 and inp0.price<5.00],'Price_Bucket'] = 'Apps that cost <5'
inp0.loc[inp0.price>=5.00,'Price_Bucket'] = 'Apps that cost >=5'
inp0.price_bucket.value_counts()

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How do I resolve it?

3
  • Instead of boolean and use bitwise & Commented Jul 25, 2021 at 7:36
  • [inp0.price>0.00 and inp0.price<3.00] should be (inp0.price>0.00) & (inp0.price<3.00). you are doing a bitwise and of two numpy boolean arrays Commented Jul 25, 2021 at 7:41
  • When I use "&" instead of "and" I get multiple errors. Cannot perform 'rand_' with a dtyped [float64] array and scalar of type [bool]. ufunc 'bitwise_and' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' Commented Jul 25, 2021 at 8:09

2 Answers 2

3

Try with np.where which works like if else in columns/vectors:

import numpy as np
inp0['Price_Bucket'] = np.where(inp0['price']==0.00, 'Free App', np.where(inp0['price']<3.00, 'Apps that cost <3', np.where(inp0['price']<5.00, 'Apps that cost <5', 'Apps that cost >=5')))
Sign up to request clarification or add additional context in comments.

Comments

0

Rather than writing multiple ifelse or np.where condition you can use pandas cut function like this:

import pandas as pd
import numpy as np
import math

bins_defined = [0, 0.000001, 3, 5, math.inf] ## price = 0 --> 'Free APP' that's why i've selected the first two interval in a tricky way
labels_defined = ['Free App', 'Apps that cost <3', 'Apps that cost <5', 'Apps that cost >=5']

inp0['Price_Bucket'] = pd.cut(inp0['price'], bins = bins_defined, labels = label_defined, right = False)

#  `right` Indicates whether bins includes the rightmost edge or not. 
# If right == True (the default), then the bins [1, 2, 3, 4] indicate (1,2], (2,3], (3,4].

For better understandinfg see the pandas.cut documentation

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.