Here is an example of how it can be done in a vectorized manner:
import numpy as np
drift = np.array([1, 1, 0])
a = np.array([[1, 1, 1, 0], [1, 1, 1, 0], [1, 1, 1, 0], [1, 1, 1, 2],
[1, 1, 1, 2], [1, 1, 1, 3], [1, 1, 1, 3], [1, 1, 1, 3],
[1, 1, 1, 3], [1, 1, 1, 3], [1, 1, 1, 3]])
def multirange(counts: np.ndarray) -> np.ndarray:
"""
Calculates concatenated ranges. Code was taken at:
https://stackoverflow.com/questions/20027936/how-to-efficiently-concatenate-many-arange-calls-in-numpy
"""
counts = counts[counts != 0]
counts1 = counts[:-1]
reset_index = np.cumsum(counts1)
incr = np.ones(counts.sum(), dtype=int)
incr[0] = 0
incr[reset_index] = 1 - counts1
incr.cumsum(out=incr)
return incr
def drifts(ids: np.ndarray,
drift: np.ndarray) -> np.ndarray:
diffs = np.diff(ids)
max_drifts_per_id = np.concatenate((np.where(diffs)[0], [len(ids) - 1])) + 1
max_drifts_per_id[1:] = max_drifts_per_id[1:] - max_drifts_per_id[:-1]
multipliers = multirange(max_drifts_per_id)
drifts = np.tile(drift, (len(ids), 1))
return drifts * multipliers[:, np.newaxis]
a[:, :-1] += drifts(a[:, -1], drift)
print(a)
Output:
array([[0, 0, 0, 0],
[1, 1, 0, 0],
[2, 2, 0, 0],
[0, 0, 0, 2],
[1, 1, 0, 2],
[0, 0, 0, 3],
[1, 1, 0, 3],
[2, 2, 0, 3],
[3, 3, 0, 3],
[4, 4, 0, 3],
[5, 5, 0, 3]])
Explanation:
The idea of the drifts function is to take an array of ids (which in our case we can obtain as a[:, -1]: array([0, 0, 0, 2, 2, 3, 3, 3, 3, 3, 3])) and drift (np.array([1, 1, 0])) to get the following array which then can be appended to the original array:
array([[0, 0, 0],
[1, 1, 0],
[2, 2, 0],
[0, 0, 0],
[1, 1, 0],
[0, 0, 0],
[1, 1, 0],
[2, 2, 0],
[3, 3, 0],
[4, 4, 0],
[5, 5, 0]])
Line by line:
diffs = np.diff(ids)
Here we get an array where all non-zero elements will have indices of the last ids in the first array:
array([0, 0, 2, 0, 1, 0, 0, 0, 0, 0])
See np.diff for details.
max_drifts_per_id = np.concatenate((np.where(diffs)[0], [len(ids) - 1])) + 1
np.where(diffs)[0] will give indices of those non-zero elements from the previous array. We append index of the last element and increment the resulting indices by 1 in order to get ranges later. See np.where for details. After concatenation max_drifts_per_id will be:
array([ 3, 5, 11])
max_drifts_per_id[1:] = max_drifts_per_id[1:] - max_drifts_per_id[:-1]
Here from the previous result we get an array of end values of ranges:
array([3, 2, 6])
multipliers = multirange(max_drifts_per_id)
We use multirange as an efficient alternative to concatenating calls of np.arange. See How to efficiently concatenate many arange calls in numpy?
for details. Resulting multipliers will be:
array([0, 1, 2, 0, 1, 0, 1, 2, 3, 4, 5])
drifts = np.tile(drift, (len(ids), 1))
By np.tile we expand the drift to have the same number of rows as ids:
array([[1, 1, 0],
[1, 1, 0],
[1, 1, 0],
[1, 1, 0],
[1, 1, 0],
[1, 1, 0],
[1, 1, 0],
[1, 1, 0],
[1, 1, 0],
[1, 1, 0],
[1, 1, 0]])
return drifts * multipliers[:, np.newaxis]
We multiply it by multipliers and get:
array([[0, 0, 0],
[1, 1, 0],
[2, 2, 0],
[0, 0, 0],
[1, 1, 0],
[0, 0, 0],
[1, 1, 0],
[2, 2, 0],
[3, 3, 0],
[4, 4, 0],
[5, 5, 0]])
And finally this returned value can be added to the original array:
a[:, :-1] += drifts(a[:, -1], drift)