Generalise slicing operation in a NumPy array

Question

This question is based on this older question:

Given an array:
In [122]: arr = np.array([[1, 3, 7], [4, 9, 8]]); arr
Out[122]: 
array([[1, 3, 7],
       [4, 9, 8]])
And given its indices:
In [127]: np.indices(arr.shape)
Out[127]: 
array([[[0, 0, 0],
        [1, 1, 1]],

       [[0, 1, 2],
        [0, 1, 2]]])
How would I be able to stack them neatly one against the other to form a new 2D array? This is what I'd like:
array([[0, 0, 1],
       [0, 1, 3],
       [0, 2, 7],
       [1, 0, 4],
       [1, 1, 9],
       [1, 2, 8]])

This solution by Divakar is what I currently use for 2D arrays:

def indices_merged_arr(arr):
    m,n = arr.shape
    I,J = np.ogrid[:m,:n]
    out = np.empty((m,n,3), dtype=arr.dtype)
    out[...,0] = I
    out[...,1] = J
    out[...,2] = arr
    out.shape = (-1,3)
    return out

Now, if I wanted to pass a 3D array, I need to modify this function:

def indices_merged_arr(arr):
    m,n,k = arr.shape   # here
    I,J,K = np.ogrid[:m,:n,:k]   # here
    out = np.empty((m,n,k,4), dtype=arr.dtype)   # here
    out[...,0] = I
    out[...,1] = J
    out[...,2] = K     # here
    out[...,3] = arr
    out.shape = (-1,4)   # here
    return out

But this function now works for 3D arrays only - I can't pass a 2D array to it.

Is there some sort of way I can generalise this to work for any dimension? Here's my attempt:

def indices_merged_arr_general(arr):
    tup = arr.shape   
    idx = np.ogrid[????]   # not sure what to do here....
    out = np.empty(tup + (len(tup) + 1, ), dtype=arr.dtype) 
    for i, j in enumerate(idx):
        out[...,i] = j
    out[...,len(tup) - 1] = arr
    out.shape = (-1, len(tup)
    return out

I'm having trouble with this line:

idx = np.ogrid[????]

How can I get this working?

@hpaulj That looks like another great alternative. I'm figuring out a way to unpack those indices now. — cs95
– cs95, Commented Sep 9, 2017 at 21:23
@coldspeed too bad the ndenumerate example was deleted, it was the easiest to construct an indexed array with int for i,j (x,y) and k (z) for float. — NaN
– NaN, Commented Sep 10, 2017 at 0:28

Divakar · Accepted Answer · 2017-09-09 21:53:42Z

9

Here's the extension to handle generic ndarrays -

def indices_merged_arr_generic(arr, arr_pos="last"):
    n = arr.ndim
    grid = np.ogrid[tuple(map(slice, arr.shape))]
    out = np.empty(arr.shape + (n+1,), dtype=np.result_type(arr.dtype, int))

    if arr_pos=="first":
        offset = 1
    elif arr_pos=="last":
        offset = 0
    else:
        raise Exception("Invalid arr_pos")        

    for i in range(n):
        out[...,i+offset] = grid[i]
    out[...,-1+offset] = arr
    out.shape = (-1,n+1)

    return out

Sample runs

2D case :

In [252]: arr
Out[252]: 
array([[37, 32, 73],
       [95, 80, 97]])

In [253]: indices_merged_arr_generic(arr)
Out[253]: 
array([[ 0,  0, 37],
       [ 0,  1, 32],
       [ 0,  2, 73],
       [ 1,  0, 95],
       [ 1,  1, 80],
       [ 1,  2, 97]])

In [254]: indices_merged_arr_generic(arr, arr_pos='first')
Out[254]: 
array([[37,  0,  0],
       [32,  0,  1],
       [73,  0,  2],
       [95,  1,  0],
       [80,  1,  1],
       [97,  1,  2]])

3D case :

In [226]: arr
Out[226]: 
array([[[35, 45, 33],
        [48, 38, 20],
        [69, 31, 90]],

       [[73, 65, 73],
        [27, 51, 45],
        [89, 50, 74]]])

In [227]: indices_merged_arr_generic(arr)
Out[227]: 
array([[ 0,  0,  0, 35],
       [ 0,  0,  1, 45],
       [ 0,  0,  2, 33],
       [ 0,  1,  0, 48],
       [ 0,  1,  1, 38],
       [ 0,  1,  2, 20],
       [ 0,  2,  0, 69],
       [ 0,  2,  1, 31],
       [ 0,  2,  2, 90],
       [ 1,  0,  0, 73],
       [ 1,  0,  1, 65],
       [ 1,  0,  2, 73],
       [ 1,  1,  0, 27],
       [ 1,  1,  1, 51],
       [ 1,  1,  2, 45],
       [ 1,  2,  0, 89],
       [ 1,  2,  1, 50],
       [ 1,  2,  2, 74]])

edited Sep 9, 2017 at 21:53

answered Sep 9, 2017 at 21:03

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

cs95 Over a year ago

Divakar, for my knowledge, how does the generalised slicing work?

cs95 Over a year ago

Yes, it makes sense. The confusing bit was the slicing in the ogrid, but after playing around with the slices, it seems so obvious.

Divakar Over a year ago

@cᴏʟᴅsᴘᴇᴇᴅ Yeah, I learnt that trick recently, through this post.

unutbu · Accepted Answer · 2017-09-10 09:43:06Z

5

For large arrays, AFAIK, senderle's cartesian_product is the fastest way¹ to generate cartesian products using NumPy :

In [372]: A = np.random.random((100,100,100))

In [373]: %timeit indices_merged_arr_generic_using_cp(A)
100 loops, best of 3: 16.8 ms per loop

In [374]: %timeit indices_merged_arr_generic(A)
10 loops, best of 3: 28.9 ms per loop

Here is the setup I used to benchmark. Below, indices_merged_arr_generic_using_cp is a modification of senderle's cartesian_product to include the flattened array beside with the cartesian product:

import numpy as np
import functools

def indices_merged_arr_generic_using_cp(arr):
    """
    Based on cartesian_product
    http://stackoverflow.com/a/11146645/190597 (senderle)
    """
    shape = arr.shape
    arrays = [np.arange(s, dtype='int') for s in shape]
    broadcastable = np.ix_(*arrays)
    broadcasted = np.broadcast_arrays(*broadcastable)
    rows, cols = functools.reduce(np.multiply, broadcasted[0].shape), len(broadcasted)+1
    out = np.empty(rows * cols, dtype=arr.dtype)
    start, end = 0, rows
    for a in broadcasted:
        out[start:end] = a.reshape(-1)
        start, end = end, end + rows
    out[start:] = arr.flatten()
    return out.reshape(cols, rows).T

def indices_merged_arr_generic(arr):
    """
    https://stackoverflow.com/a/46135084/190597 (Divakar)
    """
    n = arr.ndim
    grid = np.ogrid[tuple(map(slice, arr.shape))]
    out = np.empty(arr.shape + (n+1,), dtype=arr.dtype)
    for i in range(n):
        out[...,i] = grid[i]
    out[...,-1] = arr
    out.shape = (-1,n+1)
    return out

¹Note that above I actually used senderle's cartesian_product_transpose. For me, this is the fastest version. For others, including senderle, cartesian_product is faster.

edited Sep 10, 2017 at 9:43

answered Sep 9, 2017 at 21:52

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

2 Comments

cs95 Over a year ago

I am just looking at this and wondering why I got faster timings on Divakar's function when I was benchmarking your code... weird.

unutbu Over a year ago

My earliest posts used pd.concat to merge the cartesian product into the OP's flattened DataFrame. That wasn't the best idea for speed. You might have benchmarked that version.

hpaulj · Accepted Answer · 2017-09-10 00:29:47Z

ndenumerate iterates on the elements, as opposed to the dimensions in the other solutions. So I don't expect it to win the speed tests. But here's a way of using it

In [588]:  arr = np.array([[1, 3, 7], [4, 9, 8]])
In [589]: arr
Out[589]: 
array([[1, 3, 7],
       [4, 9, 8]])
In [590]: list(np.ndenumerate(arr))
Out[590]: [((0, 0), 1), ((0, 1), 3), ((0, 2), 7), ((1, 0), 4), ((1, 1), 9), ((1, 2), 8)]

In py3 * unpacking can be used in a tuple, so the nested tuples can be flattened:

In [591]: [(*ij,v) for ij,v in np.ndenumerate(arr)]
Out[591]: [(0, 0, 1), (0, 1, 3), (0, 2, 7), (1, 0, 4), (1, 1, 9), (1, 2, 8)]
In [592]: np.array(_)
Out[592]: 
array([[0, 0, 1],
       [0, 1, 3],
       [0, 2, 7],
       [1, 0, 4],
       [1, 1, 9],
       [1, 2, 8]])

And it generalizes nicely to more dimensions:

In [593]: arr3 = np.arange(24).reshape(2,3,4)
In [594]: np.array([(*ij,v) for ij,v in np.ndenumerate(arr3)])
Out[594]: 
array([[ 0,  0,  0,  0],
       [ 0,  0,  1,  1],
       [ 0,  0,  2,  2],
       [ 0,  0,  3,  3],
       [ 0,  1,  0,  4],
       [ 0,  1,  1,  5],
       ....
       [ 1,  2,  3, 23]])

With these small samples, it's actually faster than @Diakar's function. :)

In [598]: timeit indices_merged_arr_generic(arr)
52.8 µs ± 271 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [599]: timeit indices_merged_arr_generic(arr3)
66.9 µs ± 434 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [600]: timeit np.array([(*ij,v) for ij,v in np.ndenumerate(arr)])
21.2 µs ± 40.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [601]: timeit np.array([(*ij,v) for ij,v in np.ndenumerate(arr3)])
59.4 µs ± 1.28 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

But for a large 3d array it is much slower

In [602]: A = np.random.random((100,100,100))
In [603]: timeit indices_merged_arr_generic(A)
50.3 ms ± 141 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [604]: timeit np.array([(*ij,v) for ij,v in np.ndenumerate(A)])
2.39 s ± 11.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

And with `@unutbu's - slower for small, faster for big:

In [609]: timeit indices_merged_arr_generic_using_cp(arr)
104 µs ± 1.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [610]: timeit indices_merged_arr_generic_using_cp(arr3)
141 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [611]: timeit indices_merged_arr_generic_using_cp(A)
31.1 ms ± 1.28 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Very useful for 2D and 3D arrays where you need integer indices and float values. You can construct a structured or recarray much easier

willeM_ Van Onsem · Accepted Answer · 2018-02-13 20:12:16Z

0

We can use the following oneliner:

from numpy import hstack, array, meshgrid

hstack((
        array(meshgrid(*map(range, t.shape))).T.reshape(-1,t.ndim),
        t.flatten().reshape(-1,1)
       ))

Here we first use map(range, t.shape) to construct an iterable of ranges. By using np.meshgrid(..).T.reshape(-1, t.dim) we construct the first part of the table: an n×m matrix with n the number of elements of t, and m the number of dimensions, we then add a flattened version of t at the right.

answered Feb 13, 2018 at 20:12

willeM_ Van Onsem

482k33 gold badges483 silver badges624 bronze badges

Collectives™ on Stack Overflow

Generalise slicing operation in a NumPy array

4 Answers 4

3 Comments

2 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

2 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related