2

i have a pandas dataframe whose one of the column is :

 a = [1,0,1,0,1,3,4,6,4,6]

now i want to create another column such that any value greater than 0 and less than 5 is assigned 1 and rest is assigned 0 ie:

a = [1,0,1,0,1,3,4,6,4,6]
b = [1,0,1,0,1,1,1,0,1,0]

now i have done this

dtaframe['b'] = dtaframe['a'].loc[0 < dtaframe['a'] < 5] = 1
dtaframe['b'] = dtaframe['a'].loc[dtaframe['a'] >4 or dtaframe['a']==0] = 0

but the code throws and error . what to do ?

3 Answers 3

3

You can use between to get Boolean values, then astype to convert from Boolean values to 0/1:

dtaframe['b'] = dtaframe['a'].between(0, 5, inclusive=False).astype(int)

The resulting output:

   a  b
0  1  1
1  0  0
2  1  1
3  0  0
4  1  1
5  3  1
6  4  1
7  6  0
8  4  1
9  6  0

Edit

For multiple ranges, you could use pandas.cut:

dtaframe['b'] = pd.cut(dtaframe['a'], bins=[0,1,6,9], labels=False, include_lowest=True)

You'll need to be careful about how you define bins. Using labels=False will return integer indicators for each bin, which happens to correspond with the labels you provided. You could also manually specify the labels for each bin, e.g. labels=[0,1,2], labels=[0,17,19], labels=['a','b','c'], etc. You may need to use astype if you manually specify the labels, as they'll be returned as categories.

Alternatively, you could combine loc and between to manually specify each range:

dtaframe.loc[dtaframe['a'].between(0,1), 'b'] = 0
dtaframe.loc[dtaframe['a'].between(2,6), 'b'] = 1
dtaframe.loc[dtaframe['a'].between(7,9), 'b'] = 2
Sign up to request clarification or add additional context in comments.

2 Comments

@root brilliant answer :) what if i have to make 3 class . like from 0-1 assigned 0 , 2-6 assign 1 and finally from 7-9 assign 3 . thanks for answering
@AnuragPandey: I've made an edit to address multiple ranges. You may want to make it a separate question though, in order to get more input. My method may not be optimal.
1

Try this with np.where:

dtaframe['b'] = np.where(([dtaframe['a'] > 4) | (dtaframe['a']==0),0, dtaframe['a'])

Comments

1

When using comparison operators and boolean logic to filter dataframes you can't use the pythonic idiom of a < myseries < b. Instead you need to (a < myseries) & (myseries < b)

cond1 = (0 < dtaframe['a'])
cond2 = (dtaframe['a'] <= 5)
dtaframe['b'] = (cond1 & cond2) * 1

5 Comments

I learned about it yesterday, there is also myseries.between(a, b) (docs)
@ayhan I was just about to include that too :-)
@piRSquared it seemed to be throwing some error :( 'ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()'.
My apologies. I wasn't paying attention.
i guess the problem is in using dtaframe['a'].loc.......... what do you say ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.