numpy masked array fill value still being accessed

Question

I am trying to process an image as a masked array to handle NoData areas. I decided to do a little testing first on one dimensional arrays, and am seeing something odd. here is my test code:

    a = np.array([0,1,4,3,4,-9999,33,34,-9999])
    am = np.ma.MaskedArray(a)
    am.mask = (am==-9999)

    z = np.arange(35)

    z[am]

I would expect that indexing the z array with the masked array would succeed but I am seeing the following error:

    Runtime error 
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    IndexError: index -9999 is out of bounds for size 35

can anyone comment on how this would be correctly coded? I can run the following command with success:

    z[a[a>0]]

which is effectively the same thing.

Thanks!

ely · Accepted Answer · 2015-04-01 02:47:58Z

It's generally a bad idea to use marked arrays for purposes of indexing, precisely because the behavior that should happen at a masked value is undefined.

Think about it this way: when I look at your array a and your array z, I can say "Ok, a[0] = 0 so z[a[0]] makes sense." And so on until I come across a[5] = -9999 when I can say, "OK, that can't make sense as an index for z" and an exception can be raised.

This is in fact what will happen when you naively use am as an index set ... it reverts to using am.data which contains all of the original values. If instead it tried to use something like [z[i] for i in am] you would run smack into the problem of encountering numpy.ma.core.MaskedConstant which is not a sensible value for indexing -- not for fetching a value nor for ignoring the request to fetch a value.

In [39]: l = [x for x in am]

In [40]: l
Out[40]: [0, 1, 4, 3, 4, masked, 33, 34, masked]

In [41]: type(l[-1])
Out[41]: numpy.ma.core.MaskedConstant

(In fact, if you try to index on one of these guys, you get IndexError: arrays used as indices must be of integer (or boolean) type).

But now what happens if I come across the masked value in am.filled()? The entry at the 5th index of am.filled() won't be an instance of numpy.ma.core.MaskedConstant -- it will be whatever fill value has been selected by you. If that fill value makes sense as an index, well then you will actually fetch a value by indexing at that index. Take 0 as an example. It seems like an innocuous, neutral fill value, but actually it represents a valid index, so you get two extra accesses to the 0th entry of z:

In [42]: am.fill_value = 0

In [43]: z[am.filled()]
Out[43]: array([ 0,  1,  4,  3,  4,  0, 33, 34,  0])

and this isn't exactly what the mask is supposed to do either!

A half-baked approach is to iterate over am and exclude anything with type of np.ma.core.MaskedConstant:

In [45]: z[np.array([x for x in am if type(x) is not np.ma.core.MaskedConstant])]
Out[45]: array([ 0,  1,  4,  3,  4, 33, 34])

But really a much clearer expression of all of this is to just use plain logical indexing in the first place:

In [47]: z[a[a != -9999]]
Out[47]: array([ 0,  1,  4,  3,  4, 33, 34])

Note that logical indexing like this will work fine for 2D arrays, as long as you're willing to accept that once a higher dimensional array is indexed logically, if the result is no longer conformable to the same regular 2D shape, then it will be presented in 1D, like this:

In [58]: a2 = np.array([[10, -9999, 13], [-9999, 1, 8], [1, 8, 1]])

In [59]: a2
Out[59]: 
array([[   10, -9999,    13],
       [-9999,     1,     8],
       [    1,     8,     1]])

In [60]: z2 = np.random.rand(3,3)

In [61]: z2[np.where(a2 != -9999)]
Out[61]: 
array([ 0.4739082 ,  0.13629442,  0.46547732,  0.87674102,  0.08753297,
        0.57109764,  0.39722408])

If instead you want something similar to the effect of a mask, you can just set values equal to NaN (for float arrays):

In [66]: a2 = np.array([[10, -9999, 13], [-9999, 1, 8], [1, 8, 1]], dtype=np.float)

In [67]: a2
Out[67]: 
array([[  1.00000000e+01,  -9.99900000e+03,   1.30000000e+01],
       [ -9.99900000e+03,   1.00000000e+00,   8.00000000e+00],
       [  1.00000000e+00,   8.00000000e+00,   1.00000000e+00]])

In [68]: a2[np.where(a2 == -9999)] = np.NaN

In [69]: a2
Out[69]: 
array([[ 10.,  nan,  13.],
       [ nan,   1.,   8.],
       [  1.,   8.,   1.]])

This form of masking with NaN is suitable for a lot of vectorized array computations in NumPy, although it can be a pain to worry about converting integer-based image data to floating point first, and converting back safely at the end.

Thanks Mr. F! That was a very thorough explanation, and I, as I'm sure others will, appreciate.
I wanted to add some more..I'm used to handling NoData areas fairly easily in image processing applications that require mathematic manipulation, transforms, etc., but I'm writing a histogram matching program which requires indexing and thus the attempted use of masked arrays. I think the looping approach is most appropriate for my specific use case. Thanks again!

avinash pandey · Accepted Answer · 2015-04-01 02:16:53Z

1

Try this code

a = np.array([0,1,4,3,4,-9999,33,34,-9999])
am = np.ma.MaskedArray(a)
am.mask = (am==-9999)
np.ma.set_fill_value(am, 0)

z = np.arange(35)

print z[am.filled()]

accessing am gives the masked array where masked value refers to the original values(it is just a reference to the original array).Calling am.filled() after setting fill_value replaces the masked elements with the fill_value in the array returned by am.filled

answered Apr 1, 2015 at 2:16

avinash pandey

1,3812 gold badges11 silver badges16 bronze badges

Collectives™ on Stack Overflow

numpy masked array fill value still being accessed

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related