How to exclude few ranges from numpy array at once?

Question

Say, one have a following numpy array:

X = numpy.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5])

Now, how one can exclude from the array X ranges X[0:2], X[6:8] and X[12:14] at once, so one will get in result X= [2, 2, 2, 4, 4, 4]?

One way or other you need to loop over the slices. One solution lets np.r_ do that, another repeatedly uses delete, yet another combines slices. — hpaulj
– hpaulj, Commented Sep 22, 2015 at 23:23

unutbu · Accepted Answer · 2015-09-22 19:01:44Z

4

You could use np.r_ to combine the ranges into a 1D array:

In [18]: np.r_[0:2,6:8,12:14]
Out[18]: array([ 0,  1,  6,  7, 12, 13])

Then use np.in1d to create a boolean array which is True at those index locations:

In [19]: np.in1d(np.arange(len(X)), (np.r_[0:2,6:8,12:14]))
Out[19]: 
array([ True,  True, False, False, False, False,  True,  True, False,
       False, False, False,  True,  True, False], dtype=bool)

And then use~ to invert the boolean array:

In [11]: X = np.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5])

In [12]: X[~np.in1d(np.arange(len(X)), (np.r_[0:2,6:8,12:14]))]
Out[12]: array([1, 2, 2, 2, 3, 4, 4, 4, 5])

Note that X[12:14] captures only the first two 5's. There is one 5 left over, so the result is array([1, 2, 2, 2, 3, 4, 4, 4, 5]), not array([1, 2, 2, 2, 3, 4, 4, 4]).

Slice ranges in Python are half-open intervals. The left index is included, but the right index is not. So X[12:14] selects X[12] and X[13], but not X[14]. See this post for Guido van Rossum's explanation for why Python uses half-open intervals.

To get the result [2, 2, 2, 4, 4, 4] you would need to add one to the right-hand (ending) index for each slice:

In [17]: X[~np.in1d(np.arange(len(X)), (np.r_[0:3,6:9,12:15]))]
Out[17]: array([2, 2, 2, 4, 4, 4])

edited Sep 22, 2015 at 19:01

answered Sep 22, 2015 at 18:19

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Alex Riley Over a year ago

Unless I'm missing something, I think you could just write np.r_[0:2, 6:8, 12:14] instead of np.r_[np.s_[0:2,6:8,12:14]].

unutbu Over a year ago

@ajcr: Oh! Thank you for the improvement.

hpaulj Over a year ago

Or you can let delete construct the mask for you: np.delete(X, np.r_[0:3,6:9,12:15])

unutbu Over a year ago

@hpaulj: I think that is a much faster method, particularly for large X. Would you like to write it up as a solution?

soltex · Accepted Answer · 2015-09-22 18:26:15Z

1

You can use something like this:

numbers = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5]
exclude = set(range(0,2) + range(6,8) + range(12,14))
[n for n in numbers if n not in exclude]

or:

[i for i in nums if i not in xrange(0,2) and i not in xrange(6,8) and i not in xrange(12,14)]

result:

[2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5]

edited Sep 22, 2015 at 18:26

answered Sep 22, 2015 at 18:19

soltex

3,6111 gold badge25 silver badges36 bronze badges

Comments

hpaulj · Accepted Answer · 2015-09-23 22:04:18Z

In a comment to @unutbus answer I suggested np.delete. Here are a few timings

A larger test array:

In [445]: A=np.arange(1000)

@unutbu's answer:

In [446]: timeit A[~np.in1d(np.arange(len(A)), (np.r_[10:50:3,100:200,300:350]))].shape
1000 loops, best of 3: 454 µs per loop

Same index list, but using np.delete - about 3x speedup

In [447]: timeit np.delete(A,np.r_[10:50:3,100:200,300:350]).shape
10000 loops, best of 3: 166 µs per loop

But doing a straight forward boolean masking is even faster. Earlier I deduced that np.delete does basically this, but it must have some added overhead (including the ability to handle multiple dimensions):

In [448]: %%timeit
ind=np.ones_like(A,bool)
ind[np.r_[10:50:3,100:200,300:350]]=False
A[ind].shape
   .....: 
10000 loops, best of 3: 71.5 µs per loop

np.delete has a different strategy when the input is a slice, which may be faster than boolean indexing. But it only handles one slice at a time, hence the nested delete that @Kasramvd shows. I intend to add that timing.

Concatenating multiple slices is another option.

np.r_ also involves a loop, but it is only over the slices. Basically it iterates over the slices, expanding each as a range, and concatenates them. In my fastest case it is responsible for 2/3 of the run time:

In [451]: timeit np.r_[10:50:3,100:200,300:350]
10000 loops, best of 3: 41 µs per loop
In [453]: %%timeit x=np.r_[10:50:3,100:200,300:350]
ind=np.ones_like(A,bool)
ind[x]=False
A[ind].shape
   .....: 
10000 loops, best of 3: 24.2 µs per loop

The nested delete has pretty good performance:

In [457]: timeit np.delete( np.delete( np.delete(A,slice(300,350)),
   slice(100,200)),slice(10,50,3)).shape
10000 loops, best of 3: 108 µs per loop

np.delete, when given a slice to delete, copies slices to the result array (the blocks before and after the delete block). I can approximate that by concatenating several slices. I'm cheating here by using delete for the 1st block, rather than take the time to write a pure copy. Still it is faster than the best boolean mask expression.

In [460]: timeit np.concatenate([np.delete(A[:100],slice(10,50,3)),
   A[200:300],A[350:]]).shape
10000 loops, best of 3: 65.7 µs per loop

I can remove the delete with this slicing, though the order of the 10:50 range is messed up. I suspect that this is, theoretically, the fastest:

In [480]: timeit np.concatenate([A[:10], A[11:50:3], A[12:50:3],
    A[50:100], A[200:300], A[350:]]).shape
100000 loops, best of 3: 16.1 µs per loop

An important caution - these alternatives are being tested with non-overlapping slices. Some may work with overlaps, others might not.

Kasravnd · Accepted Answer · 2015-09-22 18:34:02Z

0

You can call the np.delete 3 time and since @nneonneo said in comment do it reverse which doesn't need to calculate range offsets. :

>>> np.delete(np.delete(np.delete(X,np.s_[12:14]),np.s_[6:8]),np.s_[0:2])
array([1, 2, 2, 2, 3, 4, 4, 4, 5])

edited Sep 22, 2015 at 18:34

answered Sep 22, 2015 at 18:28

Kasravnd

108k19 gold badges167 silver badges195 bronze badges

2 Comments

nneonneo Over a year ago

Or you could delete ranges starting from the end, which removes the need to calculate range offsets.

hpaulj Over a year ago

delete when given slices (from s_) just copies the slices that it wants to keep to a new array. Effectively your expression concatenates X[2:6], X[8:12], X[14:].

Chad S. · Accepted Answer · 2015-09-22 21:40:58Z

0

Just compose X based on the intervals you want to keep..

X = np.array(list(X[3:6]) + list(X[9:12]))

edited Sep 22, 2015 at 21:40

answered Sep 22, 2015 at 18:16

Chad S.

6,66918 silver badges26 bronze badges

4 Comments

hpaulj Over a year ago

Is your + array addition, or list concatenation?

hpaulj Over a year ago

X in this question is a numpy array, for which + is addition.

Chad S. Over a year ago

Ok. I adjusted the solution

hpaulj Over a year ago

Use np.concatenate([X[3:6], X[9:12]]) to do it entirely with arrays.

phntm · Accepted Answer · 2020-09-10 22:02:17Z

0

Not sure this is helpful but IF the output of each range is unique, you can index by range count.

X = numpy.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5])

A = np.unique(X)

Out[79]: array([1, 2, 3, 4, 5])

Here we want to keep the second and fourth range so.

X = X[(X==A[1])|(X==A[3])]  

Out[82]: array([2, 2, 2, 4, 4, 4])

answered Sep 10, 2020 at 22:02

phntm

5414 silver badges13 bronze badges

Collectives™ on Stack Overflow

How to exclude few ranges from numpy array at once?

6 Answers 6

4 Comments

Comments

Comments

2 Comments

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

4 Comments

Comments

Comments

2 Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related