2

I have two numpy arrays and would like to merge them with the following rule desirably without using any for loop.

  • Take the first n rows from the first array.
  • Add the first m rows from the second array.
  • Add rows between n and 2n from the first array.
  • Add rows between m and 2m from the second array.

.....

  • Add the last m rows from the second array.

For instance, let's say I have two arrays and n=2, m=3

x = np.random.randint(10, size=(10, 6))
y = np.random.randint(20, size=(12, 6))

[[5 0 2 2 6 1]
 [4 8 9 2 7 2]
 [5 5 0 5 3 0]
 [2 1 4 7 9 4]
 [8 1 1 9 2 8]
 [4 1 1 0 1 1]
 [2 9 3 5 7 9]
 [3 6 6 6 0 4]
 [4 4 7 3 7 9]
 [7 3 7 1 5 2]] 

[[ 3 15  3  8 12 12]
 [19 12 13  0 19 16]
 [11  2 18 16  9 19]
 [19 15 15 11 13  2]
 [19 14  1  6 13 17]
 [19 14 19 14 13  3]
 [ 0  1 13  0 19 10]
 [19 13 19  5 16 13]
 [12  4 15 11 12 17]
 [ 4 19 17  2 11 12]
 [ 9 12 10  9 15  3]
 [13  7  2  5 13 10]]

The desired output is

[[5 0 2 2 6 1]
 [4 8 9 2 7 2]
[ 3 15  3  8 12 12]
 [19 12 13  0 19 16]
 [11  2 18 16  9 19]
[5 5 0 5 3 0]
 [2 1 4 7 9 4]
[19 15 15 11 13  2]
 [19 14  1  6 13 17]
 [19 14 19 14 13  3]
[8 1 1 9 2 8]
 [4 1 1 0 1 1]
[ 0  1 13  0 19 10]
 [19 13 19  5 16 13]
 [12  4 15 11 12 17]
[2 9 3 5 7 9]
 [3 6 6 6 0 4]
[ 4 19 17  2 11 12]
 [ 9 12 10  9 15  3]
 [13  7  2  5 13 10]
[4 4 7 3 7 9]
 [7 3 7 1 5 2]

3 Answers 3

2

You can create an output array and place the inputs into it by index. The output is always

output = np.empty((x.shape[0] + y.shape[0], x.shape[1]), dtype=x.dtype)

You can generate the output indices like:

idx = (np.arange(0, output.shape[0] - n + 1, m + n)[:, None] + np.arange(n)).ravel()
idy = (np.arange(n, output.shape[0] - m + 1, m + n)[:, None] + np.arange(m)).ravel()

This creates a column vector of start indices and adds the n or m steps to mark all rows where the inputs go. You can then assign the inputs directly:

output[idx, :] = x
output[idy, :] = y
Sign up to request clarification or add additional context in comments.

5 Comments

first sorry for contacting you this way. I admire your numpy skills. I am trying to learn it. Any good tutorial, courses you know of?
@wwnde. The official documentation and lots and lots of practice. The best way to learn a tool is to have a particular problem you want to solve with it. Numpy is just a tool, as is python. No point in learning a tool if you don't have a purpose for it.
Good one, I work with data and often have multiple problems but prefer to go the pandas or pyspark way, maybe something to put into consideration. Thanks man
@wwnde. Nothing wrong with pandas. It's another layer of abstraction built on top of numpy. Lots of things you can do with one easily but not the other. Really depends on your needs.
Helpful, advice heeded, will work round this
1

You can create a function that splits an array into sequential slices (chunks). Then, chunk both arrays and use the itertools.zip_longest function to interleave them. Finally wrap the output in np.vstack to get the new array.

import numpy as np
from itertool import zip_longest
from math import ceil

def chunk(arr, n):
    """Split an array `arr` into n-sized chunks along its first axis"""
    for i in range(ceil(len(arr)/n)):
        ix = slice(i * n, (i+1) * n)
        yield arr[ix]

def chunk_stack(a, b, n, m):
    """Splits the arrays `a` and `b` into `n` and `m` sized chunks. 
    Returns an array of the interleaved chunks.
    """
    chunker_a = chunk(a, n)
    chunker_b = chunk(b, m)
    arr = []
    for cha, chb in zip_longest(chunker_a, chunker_b):
        if cha is not None:
            arr.append(cha)
        if chb is not None:
            arr.append(chb)
    return np.vstack(arr)

Test it on your example arrays:

x = np.array(
[[5, 0, 2, 2, 6, 1],
 [4, 8, 9, 2, 7, 2],
 [5, 5, 0, 5, 3, 0],
 [2, 1, 4, 7, 9, 4],
 [8, 1, 1, 9, 2, 8],
 [4, 1, 1, 0, 1, 1],
 [2, 9, 3, 5, 7, 9],
 [3, 6, 6, 6, 0, 4],
 [4, 4, 7, 3, 7, 9],
 [7, 3, 7, 1, 5, 2]])

y = np.array(
[[3, 15, 3, 8, 12, 12],
 [19, 12, 13, 0, 19, 16],
 [11, 2, 18, 16, 9, 19],
 [19, 15, 15, 11, 13, 2],
 [19, 14, 1, 6, 13, 17],
 [19, 14, 19, 14, 13, 3],
 [0, 1, 13, 0, 19, 10],
 [19, 13, 19, 5, 16, 13],
 [12, 4, 15, 11, 12, 17],
 [4, 19, 17, 2, 11, 12],
 [9, 12, 10, 9, 15, 3],
 [13, 7, 2, 5, 13, 10]])

chunk_stack(x, y, 2, 3)
# returns:
array([[ 5,  0,  2,  2,  6,  1],
       [ 4,  8,  9,  2,  7,  2],
       [ 3, 15,  3,  8, 12, 12],
       [19, 12, 13,  0, 19, 16],
       [11,  2, 18, 16,  9, 19],
       [ 5,  5,  0,  5,  3,  0],
       [ 2,  1,  4,  7,  9,  4],
       [19, 15, 15, 11, 13,  2],
       [19, 14,  1,  6, 13, 17],
       [19, 14, 19, 14, 13,  3],
       [ 8,  1,  1,  9,  2,  8],
       [ 4,  1,  1,  0,  1,  1],
       [ 0,  1, 13,  0, 19, 10],
       [19, 13, 19,  5, 16, 13],
       [12,  4, 15, 11, 12, 17],
       [ 2,  9,  3,  5,  7,  9],
       [ 3,  6,  6,  6,  0,  4],
       [ 4, 19, 17,  2, 11, 12],
       [ 9, 12, 10,  9, 15,  3],
       [13,  7,  2,  5, 13, 10],
       [ 4,  4,  7,  3,  7,  9],
       [ 7,  3,  7,  1,  5,  2]])

Comments

1

We reshape x's and y's grouping n's and m's together

Then we horizontally stack so that n's and m's form alternate sequence

Then what ever x's and y's are reamining we append those

x = np.random.randint(10, size=(10, 6))
y = np.random.randint(20, size=(12, 6))
n, m = 2, 3
output = np.empty((x.shape[0] + y.shape[0], x.shape[1]), dtype=x.dtype)

x_dim_1 = x.shape[0] // n  # 5
y_dim_1 = y.shape[0] // m  # 4

common_dim = min(x_dim_1, y_dim_1) # 4

x_1 = x[:common_dim * n].reshape(common_dim, n, -1) # (4, 2, 6)
y_1 = y[:common_dim * m].reshape(common_dim, m, -1) # (4, 3, 6)

# We stack horizontally x_1, y_1 to (4, 5, 6) then convert 4, 5 -> 4*5
# make n's and m's alternate
assign_til = common_dim * (n + m)
output[:assign_til] = np.hstack([x_1, y_1]).reshape(assign_til, x.shape[1])

# Remaining x's and y's
r_x = x[common_dim * n:]
r_y = y[common_dim * m:]

# Next entry in output will be of r_x, since alternate
# Choose n entries or whatever remaining and append those
rem = min(r_x.shape[0], n)
output[assign_til:assign_til + rem] = r_x[:rem]
assign_til += rem

# Next append all remaining y's
output[assign_til:] = r_y
assign_til += r_y.shape[0]

# If by chance x_dim_1 > y_dim_1 then r_x has atleast n elements
output[assign_til:] = r_x[rem:]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.