4

I kept getting errors with a numpy ndarray with booleans not being accepted as a mask by a pandas structure when it occured to me that I may have the 'wrong' booleans. Edit: it was not a raw numpy array but a pandas.Index.

While I was able to find a solution, the only one that worked was quite ugly:

mymask = mymask.astype(np.bool_) #ver.1 does not work, elements remain <class 'bool'>
mymask = mymask==True #ver.2, does work, elements become <class 'numpy.bool_'>
mypdstructure[mymask] 

What's the proper way to typecast the values?

7
  • What kind of errors? numpy and pandas play very well together. Are you sure this is not just an issue of labels? Commented Nov 11, 2017 at 16:26
  • "IndexError: arrays used as indices must be of integer (or boolean) type". I'm fairly sure that things are otherwise correct because I get the results I expect using ver.2 Commented Nov 11, 2017 at 16:27
  • Do you have NaNs in mymask? Or something else that may cause upcasting the dtype of the array? Commented Nov 11, 2017 at 16:31
  • @JohnSmith What version of numpy? I can not reproduce your result on numpy 1.13.1 / Python 3.6.0. Both ways give np.bool_ in the array. Commented Nov 11, 2017 at 16:31
  • @ayhan no. I checked the output and it's only True or False Commented Nov 11, 2017 at 16:34

1 Answer 1

1

Ok, I found the problem. My original post was not fully correct: my mask was a pandas.Index.

It seems that the pands.Index.astype is behaving unexpectedly (for me), as I get different behavior for the following:

mask = pindex.map(myfun).astype(np.bool_) # doesn't cast
mask = pindex.map(myfun).astype(np.bool_,copy=False) # doesn't cast
mask = pindex.map(myfun).values.astype(np.bool_) # does cast

Maybe it is actually a pandas bug? This result is surprising to me because I was under the impression that pandas is usually just calling the functions of the numpy arrays that it is based on. This is clearly not the case here.

Sign up to request clarification or add additional context in comments.

2 Comments

Yeah, index.astype silently ignores the bool option as I can see. There are different index versions like Int64Index and Float64Index but there is no BoolIndex; that's probably the reason here but I would expect it to raise an error. Anyway, here are the steps to reproduce if anyone is interested in opening an issue: a = pd.Index([True, False]); a = a.astype('bool')
By the way, if you are working on someone else's code the issue might stem from the fact that the behaviour of index.map changed recently. It used to return a numpy array.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.