1

I am trying to process an image as a masked array to handle NoData areas. I decided to do a little testing first on one dimensional arrays, and am seeing something odd. here is my test code:

    a = np.array([0,1,4,3,4,-9999,33,34,-9999])
    am = np.ma.MaskedArray(a)
    am.mask = (am==-9999)

    z = np.arange(35)

    z[am]

I would expect that indexing the z array with the masked array would succeed but I am seeing the following error:

    Runtime error 
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    IndexError: index -9999 is out of bounds for size 35

can anyone comment on how this would be correctly coded? I can run the following command with success:

    z[a[a>0]]

which is effectively the same thing.

Thanks!

2 Answers 2

2

It's generally a bad idea to use marked arrays for purposes of indexing, precisely because the behavior that should happen at a masked value is undefined.

Think about it this way: when I look at your array a and your array z, I can say "Ok, a[0] = 0 so z[a[0]] makes sense." And so on until I come across a[5] = -9999 when I can say, "OK, that can't make sense as an index for z" and an exception can be raised.

This is in fact what will happen when you naively use am as an index set ... it reverts to using am.data which contains all of the original values. If instead it tried to use something like [z[i] for i in am] you would run smack into the problem of encountering numpy.ma.core.MaskedConstant which is not a sensible value for indexing -- not for fetching a value nor for ignoring the request to fetch a value.

In [39]: l = [x for x in am]

In [40]: l
Out[40]: [0, 1, 4, 3, 4, masked, 33, 34, masked]

In [41]: type(l[-1])
Out[41]: numpy.ma.core.MaskedConstant

(In fact, if you try to index on one of these guys, you get IndexError: arrays used as indices must be of integer (or boolean) type).

But now what happens if I come across the masked value in am.filled()? The entry at the 5th index of am.filled() won't be an instance of numpy.ma.core.MaskedConstant -- it will be whatever fill value has been selected by you. If that fill value makes sense as an index, well then you will actually fetch a value by indexing at that index. Take 0 as an example. It seems like an innocuous, neutral fill value, but actually it represents a valid index, so you get two extra accesses to the 0th entry of z:

In [42]: am.fill_value = 0

In [43]: z[am.filled()]
Out[43]: array([ 0,  1,  4,  3,  4,  0, 33, 34,  0])

and this isn't exactly what the mask is supposed to do either!

A half-baked approach is to iterate over am and exclude anything with type of np.ma.core.MaskedConstant:

In [45]: z[np.array([x for x in am if type(x) is not np.ma.core.MaskedConstant])]
Out[45]: array([ 0,  1,  4,  3,  4, 33, 34])

But really a much clearer expression of all of this is to just use plain logical indexing in the first place:

In [47]: z[a[a != -9999]]
Out[47]: array([ 0,  1,  4,  3,  4, 33, 34])

Note that logical indexing like this will work fine for 2D arrays, as long as you're willing to accept that once a higher dimensional array is indexed logically, if the result is no longer conformable to the same regular 2D shape, then it will be presented in 1D, like this:

In [58]: a2 = np.array([[10, -9999, 13], [-9999, 1, 8], [1, 8, 1]])

In [59]: a2
Out[59]: 
array([[   10, -9999,    13],
       [-9999,     1,     8],
       [    1,     8,     1]])

In [60]: z2 = np.random.rand(3,3)

In [61]: z2[np.where(a2 != -9999)]
Out[61]: 
array([ 0.4739082 ,  0.13629442,  0.46547732,  0.87674102,  0.08753297,
        0.57109764,  0.39722408])

If instead you want something similar to the effect of a mask, you can just set values equal to NaN (for float arrays):

In [66]: a2 = np.array([[10, -9999, 13], [-9999, 1, 8], [1, 8, 1]], dtype=np.float)

In [67]: a2
Out[67]: 
array([[  1.00000000e+01,  -9.99900000e+03,   1.30000000e+01],
       [ -9.99900000e+03,   1.00000000e+00,   8.00000000e+00],
       [  1.00000000e+00,   8.00000000e+00,   1.00000000e+00]])

In [68]: a2[np.where(a2 == -9999)] = np.NaN

In [69]: a2
Out[69]: 
array([[ 10.,  nan,  13.],
       [ nan,   1.,   8.],
       [  1.,   8.,   1.]])

This form of masking with NaN is suitable for a lot of vectorized array computations in NumPy, although it can be a pain to worry about converting integer-based image data to floating point first, and converting back safely at the end.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Mr. F! That was a very thorough explanation, and I, as I'm sure others will, appreciate.
I wanted to add some more..I'm used to handling NoData areas fairly easily in image processing applications that require mathematic manipulation, transforms, etc., but I'm writing a histogram matching program which requires indexing and thus the attempted use of masked arrays. I think the looping approach is most appropriate for my specific use case. Thanks again!
1

Try this code

a = np.array([0,1,4,3,4,-9999,33,34,-9999])
am = np.ma.MaskedArray(a)
am.mask = (am==-9999)
np.ma.set_fill_value(am, 0)

z = np.arange(35)

print z[am.filled()]

accessing am gives the masked array where masked value refers to the original values(it is just a reference to the original array).Calling am.filled() after setting fill_value replaces the masked elements with the fill_value in the array returned by am.filled

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.