Merging two numpy arrays with sequential rows

Question

I have two numpy arrays and would like to merge them with the following rule desirably without using any for loop.

Take the first n rows from the first array.
Add the first m rows from the second array.
Add rows between n and 2n from the first array.
Add rows between m and 2m from the second array.

.....

Add the last m rows from the second array.

For instance, let's say I have two arrays and n=2, m=3

x = np.random.randint(10, size=(10, 6))
y = np.random.randint(20, size=(12, 6))

[[5 0 2 2 6 1]
 [4 8 9 2 7 2]
 [5 5 0 5 3 0]
 [2 1 4 7 9 4]
 [8 1 1 9 2 8]
 [4 1 1 0 1 1]
 [2 9 3 5 7 9]
 [3 6 6 6 0 4]
 [4 4 7 3 7 9]
 [7 3 7 1 5 2]] 

[[ 3 15  3  8 12 12]
 [19 12 13  0 19 16]
 [11  2 18 16  9 19]
 [19 15 15 11 13  2]
 [19 14  1  6 13 17]
 [19 14 19 14 13  3]
 [ 0  1 13  0 19 10]
 [19 13 19  5 16 13]
 [12  4 15 11 12 17]
 [ 4 19 17  2 11 12]
 [ 9 12 10  9 15  3]
 [13  7  2  5 13 10]]

The desired output is

[[5 0 2 2 6 1]
 [4 8 9 2 7 2]
[ 3 15  3  8 12 12]
 [19 12 13  0 19 16]
 [11  2 18 16  9 19]
[5 5 0 5 3 0]
 [2 1 4 7 9 4]
[19 15 15 11 13  2]
 [19 14  1  6 13 17]
 [19 14 19 14 13  3]
[8 1 1 9 2 8]
 [4 1 1 0 1 1]
[ 0  1 13  0 19 10]
 [19 13 19  5 16 13]
 [12  4 15 11 12 17]
[2 9 3 5 7 9]
 [3 6 6 6 0 4]
[ 4 19 17  2 11 12]
 [ 9 12 10  9 15  3]
 [13  7  2  5 13 10]
[4 4 7 3 7 9]
 [7 3 7 1 5 2]

Mad Physicist · Accepted Answer · 2021-09-10 22:09:00Z

2

You can create an output array and place the inputs into it by index. The output is always

output = np.empty((x.shape[0] + y.shape[0], x.shape[1]), dtype=x.dtype)

You can generate the output indices like:

idx = (np.arange(0, output.shape[0] - n + 1, m + n)[:, None] + np.arange(n)).ravel()
idy = (np.arange(n, output.shape[0] - m + 1, m + n)[:, None] + np.arange(m)).ravel()

This creates a column vector of start indices and adds the n or m steps to mark all rows where the inputs go. You can then assign the inputs directly:

output[idx, :] = x
output[idy, :] = y

edited Sep 10, 2021 at 22:09

answered Sep 10, 2021 at 22:02

Mad Physicist

116k29 gold badges202 silver badges292 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

wwnde Over a year ago

first sorry for contacting you this way. I admire your numpy skills. I am trying to learn it. Any good tutorial, courses you know of?

Mad Physicist Over a year ago

@wwnde. The official documentation and lots and lots of practice. The best way to learn a tool is to have a particular problem you want to solve with it. Numpy is just a tool, as is python. No point in learning a tool if you don't have a purpose for it.

wwnde Over a year ago

Good one, I work with data and often have multiple problems but prefer to go the pandas or pyspark way, maybe something to put into consideration. Thanks man

Mad Physicist Over a year ago

@wwnde. Nothing wrong with pandas. It's another layer of abstraction built on top of numpy. Lots of things you can do with one easily but not the other. Really depends on your needs.

wwnde Over a year ago

Helpful, advice heeded, will work round this

James · Accepted Answer · 2021-09-10 21:42:01Z

You can create a function that splits an array into sequential slices (chunks). Then, chunk both arrays and use the itertools.zip_longest function to interleave them. Finally wrap the output in np.vstack to get the new array.

import numpy as np
from itertool import zip_longest
from math import ceil

def chunk(arr, n):
    """Split an array `arr` into n-sized chunks along its first axis"""
    for i in range(ceil(len(arr)/n)):
        ix = slice(i * n, (i+1) * n)
        yield arr[ix]

def chunk_stack(a, b, n, m):
    """Splits the arrays `a` and `b` into `n` and `m` sized chunks. 
    Returns an array of the interleaved chunks.
    """
    chunker_a = chunk(a, n)
    chunker_b = chunk(b, m)
    arr = []
    for cha, chb in zip_longest(chunker_a, chunker_b):
        if cha is not None:
            arr.append(cha)
        if chb is not None:
            arr.append(chb)
    return np.vstack(arr)

Test it on your example arrays:

x = np.array(
[[5, 0, 2, 2, 6, 1],
 [4, 8, 9, 2, 7, 2],
 [5, 5, 0, 5, 3, 0],
 [2, 1, 4, 7, 9, 4],
 [8, 1, 1, 9, 2, 8],
 [4, 1, 1, 0, 1, 1],
 [2, 9, 3, 5, 7, 9],
 [3, 6, 6, 6, 0, 4],
 [4, 4, 7, 3, 7, 9],
 [7, 3, 7, 1, 5, 2]])

y = np.array(
[[3, 15, 3, 8, 12, 12],
 [19, 12, 13, 0, 19, 16],
 [11, 2, 18, 16, 9, 19],
 [19, 15, 15, 11, 13, 2],
 [19, 14, 1, 6, 13, 17],
 [19, 14, 19, 14, 13, 3],
 [0, 1, 13, 0, 19, 10],
 [19, 13, 19, 5, 16, 13],
 [12, 4, 15, 11, 12, 17],
 [4, 19, 17, 2, 11, 12],
 [9, 12, 10, 9, 15, 3],
 [13, 7, 2, 5, 13, 10]])

chunk_stack(x, y, 2, 3)
# returns:
array([[ 5,  0,  2,  2,  6,  1],
       [ 4,  8,  9,  2,  7,  2],
       [ 3, 15,  3,  8, 12, 12],
       [19, 12, 13,  0, 19, 16],
       [11,  2, 18, 16,  9, 19],
       [ 5,  5,  0,  5,  3,  0],
       [ 2,  1,  4,  7,  9,  4],
       [19, 15, 15, 11, 13,  2],
       [19, 14,  1,  6, 13, 17],
       [19, 14, 19, 14, 13,  3],
       [ 8,  1,  1,  9,  2,  8],
       [ 4,  1,  1,  0,  1,  1],
       [ 0,  1, 13,  0, 19, 10],
       [19, 13, 19,  5, 16, 13],
       [12,  4, 15, 11, 12, 17],
       [ 2,  9,  3,  5,  7,  9],
       [ 3,  6,  6,  6,  0,  4],
       [ 4, 19, 17,  2, 11, 12],
       [ 9, 12, 10,  9, 15,  3],
       [13,  7,  2,  5, 13, 10],
       [ 4,  4,  7,  3,  7,  9],
       [ 7,  3,  7,  1,  5,  2]])

eroot163pi · Accepted Answer · 2021-09-11 10:08:52Z

We reshape x's and y's grouping n's and m's together

Then we horizontally stack so that n's and m's form alternate sequence

Then what ever x's and y's are reamining we append those

x = np.random.randint(10, size=(10, 6))
y = np.random.randint(20, size=(12, 6))
n, m = 2, 3
output = np.empty((x.shape[0] + y.shape[0], x.shape[1]), dtype=x.dtype)

x_dim_1 = x.shape[0] // n  # 5
y_dim_1 = y.shape[0] // m  # 4

common_dim = min(x_dim_1, y_dim_1) # 4

x_1 = x[:common_dim * n].reshape(common_dim, n, -1) # (4, 2, 6)
y_1 = y[:common_dim * m].reshape(common_dim, m, -1) # (4, 3, 6)

# We stack horizontally x_1, y_1 to (4, 5, 6) then convert 4, 5 -> 4*5
# make n's and m's alternate
assign_til = common_dim * (n + m)
output[:assign_til] = np.hstack([x_1, y_1]).reshape(assign_til, x.shape[1])

# Remaining x's and y's
r_x = x[common_dim * n:]
r_y = y[common_dim * m:]

# Next entry in output will be of r_x, since alternate
# Choose n entries or whatever remaining and append those
rem = min(r_x.shape[0], n)
output[assign_til:assign_til + rem] = r_x[:rem]
assign_til += rem

# Next append all remaining y's
output[assign_til:] = r_y
assign_til += r_y.shape[0]

# If by chance x_dim_1 > y_dim_1 then r_x has atleast n elements
output[assign_til:] = r_x[rem:]

Collectives™ on Stack Overflow

Merging two numpy arrays with sequential rows

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related