5

I need to slice an array from a given index until a certain condition is met.

>>> a = numpy.zeros((10), dtype='|S1')
>>> a[2] = 'A'
>>> a[4] = 'X'
>>> a[8] = 'B'
>>> a
array(['', '', 'A', '', 'X', '', '', '', 'B', ''], dtype='|S1')

For instance, for the above array I want a subset from a given index until first non-zero values in both directions. For example, for index values 2, 4, 8 the results would be:

['', '', A, '']      # 2
['', X, '', '', '']  # 4
['', '', '', B, '']  # 8

Any suggestions on the simplest way to do this using the numpy API? Learning python and numpy, would appreciate any help. Thanks!

5
  • Can you clarify your question? What do you mean "until first non-None values in both directions"? Commented Mar 7, 2011 at 5:55
  • The fact that you are using object arrays (not very common and not very memory-efficient) presents a particular problem when trying to determine the index of non-None array items. Could you be persuaded to use a fixed-byte dtype? If you are committed to the object dtype, then is it true that anything "non-None" will evaluate to True when typecast as a bool? Either of those would help simplify things a lot. Commented Mar 7, 2011 at 6:04
  • @Paul I am using an object array to store single character strings. Essentially, all I need is a char array. Is there an alternative dtype I could use dtype? Commented Mar 7, 2011 at 6:15
  • 2
    @armandino: Use dtype='|S1' (or simply dtype=str) for single-character strings. Commented Mar 7, 2011 at 6:25
  • 1
    @armandino: Also, if you didn't already notice, you'll probably want numpy.zeros(...) instead of numpy.empty(...) when using dtype='S1' Commented Mar 7, 2011 at 6:28

5 Answers 5

7

If you set up your problem like this:

import numpy
a = numpy.zeros((10), dtype=str)
a[2] = 'A'
a[4] = 'X'
a[8] = 'B'

You can easily get the indices of non-empty strings like so:

i = numpy.where(a!='')[0]  # array([2, 4, 8])

Alternatively, numpy.argwhere(..) also works well.

Then you can slice away using this array:

out2 = a[:i[1]]        # 2   ['' '' 'A' '']
out4 = a[i[0]+1:i[2]]  # 4   ['' 'X' '' '' '']

etc.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Paul. That looks like what I'm after.
6

this is a work for masked arrays, numpy.ma has lots of functions for working with subsets.

a = np.zeros((10), dtype=str)
a[2] = 'A'
a[4] = 'X'
a[8] = 'B'

let's mask out not empty elements:

am=np.ma.masked_where(a!='', a)

np.ma.notmasked_contiguous goes through the array (very efficiently) and finds all the slices of contiguous elements where the array is not masked:

slices = np.ma.notmasked_contiguous(am)
[slice(0, 1, None), slice(3, 3, None), slice(5, 7, None), slice(9, 9, None)]

so, the array is continuously empty between element 5 and 7 for example. Now you only have to join the slices you are interested in, first you get the starting index of each slice:

slices_start = np.array([s.start for s in slices])

then you get the location of the index you are looking for:

slices_start.searchsorted(4) #4
Out: 2

So you want slice 1 and 2: a[slices[1].start:slices[2].stop+1] array(['', 'X', '', '', ''], dtype='|S1')

or let's try 8:

i = slices_start.searchsorted(8)
a[slices[i-1].start:slices[i].stop+1]
Out: array(['', '', '', 'B', ''], 
  dtype='|S1')

If should probably play a bit with this in ipython for understanding it better.

1 Comment

Very interesting Andrea. Thanks for the explanations. Much appreciated!
2

Note that this could be cleanly done in pure python using itertools and functools.

import functools, itertools
arr = ['', '', 'A', '', 'X', '', '', '', 'B', '']

f = functools.partial(itertools.takewhile, lambda x: not x)
def g(a, i):
    return itertools.chain(f(reversed(a[:i])), [a[i]], f(a[i+1:]))

We define f as the sub-iterator found by looking until the element evaluates as true, and g as the combination of applying this on the reversed area of the list before the index and the list after the index.

This returns generators which can be casted to lists that contain our results.

>>> list(g(arr, 2))
['', '', 'A', '']
>>> list(g(arr, 4))
['', 'X', '', '', '']
>>> list(g(arr, 8))
['', '', '', 'B', '']

Comments

0

Two loops are the first thing that comes to mind. Something like this would work:

'''Given an array and an index...'''
def getNoneSlice(a, i):

    # get the first non-None index before i
    start = 0
    for j in xrange(i - 1, -1, -1):
        if a[j] is not None: # or whatever condition
            start = j + 1
            break

    # get the first non-None index after i
    end = len(a) - 1
    for j in xrange(i + 1, len(a)):
        if a[j] is not None: # or whatever condition
            end = j - 1
            break

    # return the slice
    return a[start:end + 1]

3 Comments

Thanks Mike. The solution works perfectly (+1). I was hoping however there would be a numpy method for something like this.
I downvoted because this is very inefficient for large sparse arrays. Use the numpy methods of the other answers.
Yes, Steabert, agreed... At least I learned something new :-P
-2
def getSlice(a, n):
    try:
        startindex = a[:n].nonzero()[0][-1]
    except IndexError:
        startindex = 0
    try:
        endindex = a[(n+1):].nonzero()[0][0] + n+1
    except IndexError:
        endindex = len(a)
    return a[startindex: endindex]

2 Comments

I'm afraid it didn't work. I get ['' 'A'] ['' 'X'] ['' 'B']
The question didn't have empty strings in it when I answered it, it had 'None'. It the nonzero method works with 'None'.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.