3

Say, one have a following numpy array:

X = numpy.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5])

Now, how one can exclude from the array X ranges X[0:2], X[6:8] and X[12:14] at once, so one will get in result X= [2, 2, 2, 4, 4, 4]?

3
  • How/where are those ranges stored? Commented Sep 22, 2015 at 18:17
  • 1
    I presume you mean X[0:3],X[6:9].x[12:15]? Commented Sep 22, 2015 at 18:40
  • One way or other you need to loop over the slices. One solution lets np.r_ do that, another repeatedly uses delete, yet another combines slices. Commented Sep 22, 2015 at 23:23

6 Answers 6

4

You could use np.r_ to combine the ranges into a 1D array:

In [18]: np.r_[0:2,6:8,12:14]
Out[18]: array([ 0,  1,  6,  7, 12, 13])

Then use np.in1d to create a boolean array which is True at those index locations:

In [19]: np.in1d(np.arange(len(X)), (np.r_[0:2,6:8,12:14]))
Out[19]: 
array([ True,  True, False, False, False, False,  True,  True, False,
       False, False, False,  True,  True, False], dtype=bool)

And then use~ to invert the boolean array:

In [11]: X = np.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5])

In [12]: X[~np.in1d(np.arange(len(X)), (np.r_[0:2,6:8,12:14]))]
Out[12]: array([1, 2, 2, 2, 3, 4, 4, 4, 5])

Note that X[12:14] captures only the first two 5's. There is one 5 left over, so the result is array([1, 2, 2, 2, 3, 4, 4, 4, 5]), not array([1, 2, 2, 2, 3, 4, 4, 4]).

Slice ranges in Python are half-open intervals. The left index is included, but the right index is not. So X[12:14] selects X[12] and X[13], but not X[14]. See this post for Guido van Rossum's explanation for why Python uses half-open intervals.

To get the result [2, 2, 2, 4, 4, 4] you would need to add one to the right-hand (ending) index for each slice:

In [17]: X[~np.in1d(np.arange(len(X)), (np.r_[0:3,6:9,12:15]))]
Out[17]: array([2, 2, 2, 4, 4, 4])
Sign up to request clarification or add additional context in comments.

4 Comments

Unless I'm missing something, I think you could just write np.r_[0:2, 6:8, 12:14] instead of np.r_[np.s_[0:2,6:8,12:14]].
@ajcr: Oh! Thank you for the improvement.
Or you can let delete construct the mask for you: np.delete(X, np.r_[0:3,6:9,12:15])
@hpaulj: I think that is a much faster method, particularly for large X. Would you like to write it up as a solution?
1

You can use something like this:

numbers = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5]
exclude = set(range(0,2) + range(6,8) + range(12,14))
[n for n in numbers if n not in exclude]

or:

[i for i in nums if i not in xrange(0,2) and i not in xrange(6,8) and i not in xrange(12,14)]

result:

[2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5]

Comments

1

In a comment to @unutbus answer I suggested np.delete. Here are a few timings

A larger test array:

In [445]: A=np.arange(1000)

@unutbu's answer:

In [446]: timeit A[~np.in1d(np.arange(len(A)), (np.r_[10:50:3,100:200,300:350]))].shape
1000 loops, best of 3: 454 µs per loop

Same index list, but using np.delete - about 3x speedup

In [447]: timeit np.delete(A,np.r_[10:50:3,100:200,300:350]).shape
10000 loops, best of 3: 166 µs per loop

But doing a straight forward boolean masking is even faster. Earlier I deduced that np.delete does basically this, but it must have some added overhead (including the ability to handle multiple dimensions):

In [448]: %%timeit
ind=np.ones_like(A,bool)
ind[np.r_[10:50:3,100:200,300:350]]=False
A[ind].shape
   .....: 
10000 loops, best of 3: 71.5 µs per loop

np.delete has a different strategy when the input is a slice, which may be faster than boolean indexing. But it only handles one slice at a time, hence the nested delete that @Kasramvd shows. I intend to add that timing.

Concatenating multiple slices is another option.

np.r_ also involves a loop, but it is only over the slices. Basically it iterates over the slices, expanding each as a range, and concatenates them. In my fastest case it is responsible for 2/3 of the run time:

In [451]: timeit np.r_[10:50:3,100:200,300:350]
10000 loops, best of 3: 41 µs per loop
In [453]: %%timeit x=np.r_[10:50:3,100:200,300:350]
ind=np.ones_like(A,bool)
ind[x]=False
A[ind].shape
   .....: 
10000 loops, best of 3: 24.2 µs per loop

The nested delete has pretty good performance:

In [457]: timeit np.delete( np.delete( np.delete(A,slice(300,350)),
   slice(100,200)),slice(10,50,3)).shape
10000 loops, best of 3: 108 µs per loop

np.delete, when given a slice to delete, copies slices to the result array (the blocks before and after the delete block). I can approximate that by concatenating several slices. I'm cheating here by using delete for the 1st block, rather than take the time to write a pure copy. Still it is faster than the best boolean mask expression.

In [460]: timeit np.concatenate([np.delete(A[:100],slice(10,50,3)),
   A[200:300],A[350:]]).shape
10000 loops, best of 3: 65.7 µs per loop

I can remove the delete with this slicing, though the order of the 10:50 range is messed up. I suspect that this is, theoretically, the fastest:

In [480]: timeit np.concatenate([A[:10], A[11:50:3], A[12:50:3],
    A[50:100], A[200:300], A[350:]]).shape
100000 loops, best of 3: 16.1 µs per loop

An important caution - these alternatives are being tested with non-overlapping slices. Some may work with overlaps, others might not.

Comments

0

You can call the np.delete 3 time and since @nneonneo said in comment do it reverse which doesn't need to calculate range offsets. :

>>> np.delete(np.delete(np.delete(X,np.s_[12:14]),np.s_[6:8]),np.s_[0:2])
array([1, 2, 2, 2, 3, 4, 4, 4, 5])

2 Comments

Or you could delete ranges starting from the end, which removes the need to calculate range offsets.
delete when given slices (from s_) just copies the slices that it wants to keep to a new array. Effectively your expression concatenates X[2:6], X[8:12], X[14:].
0

Just compose X based on the intervals you want to keep..

X = np.array(list(X[3:6]) + list(X[9:12]))

4 Comments

Is your + array addition, or list concatenation?
X in this question is a numpy array, for which + is addition.
Ok. I adjusted the solution
Use np.concatenate([X[3:6], X[9:12]]) to do it entirely with arrays.
0

Not sure this is helpful but IF the output of each range is unique, you can index by range count.

X = numpy.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5])

A = np.unique(X)

Out[79]: array([1, 2, 3, 4, 5])

Here we want to keep the second and fourth range so.

X = X[(X==A[1])|(X==A[3])]  

Out[82]: array([2, 2, 2, 4, 4, 4])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.