Python - slice array until certain condition is met

Question

I need to slice an array from a given index until a certain condition is met.

>>> a = numpy.zeros((10), dtype='|S1')
>>> a[2] = 'A'
>>> a[4] = 'X'
>>> a[8] = 'B'
>>> a
array(['', '', 'A', '', 'X', '', '', '', 'B', ''], dtype='|S1')

For instance, for the above array I want a subset from a given index until first non-zero values in both directions. For example, for index values 2, 4, 8 the results would be:

['', '', A, '']      # 2
['', X, '', '', '']  # 4
['', '', '', B, '']  # 8

Any suggestions on the simplest way to do this using the numpy API? Learning python and numpy, would appreciate any help. Thanks!

Can you clarify your question? What do you mean "until first non-None values in both directions"? — Mahmoud Abdelkader
– Mahmoud Abdelkader, Commented Mar 7, 2011 at 5:55
The fact that you are using object arrays (not very common and not very memory-efficient) presents a particular problem when trying to determine the index of non-None array items. Could you be persuaded to use a fixed-byte dtype? If you are committed to the object dtype, then is it true that anything "non-None" will evaluate to True when typecast as a bool? Either of those would help simplify things a lot. — Paul
– Paul, Commented Mar 7, 2011 at 6:04
@Paul I am using an object array to store single character strings. Essentially, all I need is a char array. Is there an alternative dtype I could use dtype? — armandino
– armandino, Commented Mar 7, 2011 at 6:15
@armandino: Use dtype='|S1' (or simply dtype=str) for single-character strings. — Paul
– Paul, Commented Mar 7, 2011 at 6:25
@armandino: Also, if you didn't already notice, you'll probably want numpy.zeros(...) instead of numpy.empty(...) when using dtype='S1' — Paul
– Paul, Commented Mar 7, 2011 at 6:28

Paul · Accepted Answer · 2011-03-07 06:38:11Z

7

If you set up your problem like this:

import numpy
a = numpy.zeros((10), dtype=str)
a[2] = 'A'
a[4] = 'X'
a[8] = 'B'

You can easily get the indices of non-empty strings like so:

i = numpy.where(a!='')[0]  # array([2, 4, 8])

Alternatively, numpy.argwhere(..) also works well.

Then you can slice away using this array:

out2 = a[:i[1]]        # 2   ['' '' 'A' '']
out4 = a[i[0]+1:i[2]]  # 4   ['' 'X' '' '' '']

etc.

answered Mar 7, 2011 at 6:38

Paul

44k17 gold badges112 silver badges126 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

armandino Over a year ago

Thanks Paul. That looks like what I'm after.

Andrea Zonca · Accepted Answer · 2011-03-07 07:11:40Z

6

this is a work for masked arrays, numpy.ma has lots of functions for working with subsets.

a = np.zeros((10), dtype=str)
a[2] = 'A'
a[4] = 'X'
a[8] = 'B'

let's mask out not empty elements:

am=np.ma.masked_where(a!='', a)

np.ma.notmasked_contiguous goes through the array (very efficiently) and finds all the slices of contiguous elements where the array is not masked:

slices = np.ma.notmasked_contiguous(am)
[slice(0, 1, None), slice(3, 3, None), slice(5, 7, None), slice(9, 9, None)]

so, the array is continuously empty between element 5 and 7 for example. Now you only have to join the slices you are interested in, first you get the starting index of each slice:

slices_start = np.array([s.start for s in slices])

then you get the location of the index you are looking for:

slices_start.searchsorted(4) #4
Out: 2

So you want slice 1 and 2: a[slices[1].start:slices[2].stop+1] array(['', 'X', '', '', ''], dtype='|S1')

or let's try 8:

i = slices_start.searchsorted(8)
a[slices[i-1].start:slices[i].stop+1]
Out: array(['', '', '', 'B', ''], 
  dtype='|S1')

If should probably play a bit with this in ipython for understanding it better.

answered Mar 7, 2011 at 7:11

Andrea Zonca

8,83310 gold badges46 silver badges74 bronze badges

1 Comment

armandino Over a year ago

Very interesting Andrea. Thanks for the explanations. Much appreciated!

dagoof · Accepted Answer · 2011-03-07 09:46:39Z

Note that this could be cleanly done in pure python using itertools and functools.

import functools, itertools
arr = ['', '', 'A', '', 'X', '', '', '', 'B', '']

f = functools.partial(itertools.takewhile, lambda x: not x)
def g(a, i):
    return itertools.chain(f(reversed(a[:i])), [a[i]], f(a[i+1:]))

We define f as the sub-iterator found by looking until the element evaluates as true, and g as the combination of applying this on the reversed area of the list before the index and the list after the index.

This returns generators which can be casted to lists that contain our results.

>>> list(g(arr, 2))
['', '', 'A', '']
>>> list(g(arr, 4))
['', 'X', '', '', '']
>>> list(g(arr, 8))
['', '', '', 'B', '']

Mike M. Lin · Accepted Answer · 2011-03-07 06:03:03Z

0

Two loops are the first thing that comes to mind. Something like this would work:

'''Given an array and an index...'''
def getNoneSlice(a, i):

    # get the first non-None index before i
    start = 0
    for j in xrange(i - 1, -1, -1):
        if a[j] is not None: # or whatever condition
            start = j + 1
            break

    # get the first non-None index after i
    end = len(a) - 1
    for j in xrange(i + 1, len(a)):
        if a[j] is not None: # or whatever condition
            end = j - 1
            break

    # return the slice
    return a[start:end + 1]

answered Mar 7, 2011 at 6:03

Mike M. Lin

10.1k13 gold badges55 silver badges63 bronze badges

3 Comments

armandino Over a year ago

Thanks Mike. The solution works perfectly (+1). I was hoping however there would be a numpy method for something like this.

steabert Over a year ago

I downvoted because this is very inefficient for large sparse arrays. Use the numpy methods of the other answers.

Mike M. Lin Over a year ago

Yes, Steabert, agreed... At least I learned something new :-P

pwdyson · Accepted Answer · 2011-03-07 06:21:47Z

-2

def getSlice(a, n):
    try:
        startindex = a[:n].nonzero()[0][-1]
    except IndexError:
        startindex = 0
    try:
        endindex = a[(n+1):].nonzero()[0][0] + n+1
    except IndexError:
        endindex = len(a)
    return a[startindex: endindex]

answered Mar 7, 2011 at 6:21

pwdyson

1,1777 silver badges14 bronze badges

2 Comments

armandino Over a year ago

I'm afraid it didn't work. I get ['' 'A'] ['' 'X'] ['' 'B']

pwdyson Over a year ago

The question didn't have empty strings in it when I answered it, it had 'None'. It the nonzero method works with 'None'.

Collectives™ on Stack Overflow

Python - slice array until certain condition is met

5 Answers 5

1 Comment

1 Comment

Comments

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

1 Comment

Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related