Accumulate rows of NumPy array based on the last column

Question

I have the following question. I have an array with coordinate arrays in it, the first three entries are the x,y,z coordinate, the 4th entry is the id of the track. I want to add a drift to the tracks, which starts after the first time point. Is there a simple approach of adding the drift dynamically to the tracks with their ids, which could have different length, instantly to the whole array? (So as you can see, the track with id has only 3 coordinate entries, and track with id 3 has 6)

import numpy as np
drift=np.array([1,1,0])
a = np.array([[1,1,1,0],[1,1,1,0],[1,1,1,0],
              [1,1,1,2],[1,1,1,2],[1,1,1,3],
              [1,1,1,3],[1,1,1,3],[1,1,1,3],
              [1,1,1,3],[1,1,1,3]])

Output:

output = np.array([[1,1,1,0],[2,2,1,0],[3,3,1,0],
                   [1,1,1,2],[2,2,1,2],[1,1,1,3],
                   [2,2,1,3],[3,3,1,3],[4,4,1,3],
                   [5,5,1,3],[6,6,1,3]])

What is track with ids and track without? Can you please add more description. What did you try so far? What is your approach? I just want to understand the problem. — Sumit Jha
– Sumit Jha, Commented Apr 3, 2018 at 13:53

Georgy · Accepted Answer · 2018-04-03 16:31:44Z

Here is an example of how it can be done in a vectorized manner:

import numpy as np


drift = np.array([1, 1, 0])
a = np.array([[1, 1, 1, 0], [1, 1, 1, 0], [1, 1, 1, 0], [1, 1, 1, 2], 
              [1, 1, 1, 2], [1, 1, 1, 3], [1, 1, 1, 3], [1, 1, 1, 3], 
              [1, 1, 1, 3], [1, 1, 1, 3], [1, 1, 1, 3]])


def multirange(counts: np.ndarray) -> np.ndarray:
    """
    Calculates concatenated ranges. Code was taken at:
    https://stackoverflow.com/questions/20027936/how-to-efficiently-concatenate-many-arange-calls-in-numpy
    """
    counts = counts[counts != 0]
    counts1 = counts[:-1]
    reset_index = np.cumsum(counts1)
    incr = np.ones(counts.sum(), dtype=int)
    incr[0] = 0
    incr[reset_index] = 1 - counts1
    incr.cumsum(out=incr)
    return incr


def drifts(ids: np.ndarray,
           drift: np.ndarray) -> np.ndarray:
    diffs = np.diff(ids)
    max_drifts_per_id = np.concatenate((np.where(diffs)[0], [len(ids) - 1])) + 1
    max_drifts_per_id[1:] = max_drifts_per_id[1:] - max_drifts_per_id[:-1]
    multipliers = multirange(max_drifts_per_id)
    drifts = np.tile(drift, (len(ids), 1))
    return drifts * multipliers[:, np.newaxis]


a[:, :-1] += drifts(a[:, -1], drift)
print(a)

Output:

array([[0, 0, 0, 0],
       [1, 1, 0, 0],
       [2, 2, 0, 0],
       [0, 0, 0, 2],
       [1, 1, 0, 2],
       [0, 0, 0, 3],
       [1, 1, 0, 3],
       [2, 2, 0, 3],
       [3, 3, 0, 3],
       [4, 4, 0, 3],
       [5, 5, 0, 3]])

Explanation:

The idea of the drifts function is to take an array of ids (which in our case we can obtain as a[:, -1]: array([0, 0, 0, 2, 2, 3, 3, 3, 3, 3, 3])) and drift (np.array([1, 1, 0])) to get the following array which then can be appended to the original array:

array([[0, 0, 0],
       [1, 1, 0],
       [2, 2, 0],
       [0, 0, 0],
       [1, 1, 0],
       [0, 0, 0],
       [1, 1, 0],
       [2, 2, 0],
       [3, 3, 0],
       [4, 4, 0],
       [5, 5, 0]])

Line by line:

diffs = np.diff(ids)

Here we get an array where all non-zero elements will have indices of the last ids in the first array:

array([0, 0, 2, 0, 1, 0, 0, 0, 0, 0])

See np.diff for details.

max_drifts_per_id = np.concatenate((np.where(diffs)[0], [len(ids) - 1])) + 1

np.where(diffs)[0] will give indices of those non-zero elements from the previous array. We append index of the last element and increment the resulting indices by 1 in order to get ranges later. See np.where for details. After concatenation max_drifts_per_id will be:

array([ 3,  5, 11])

max_drifts_per_id[1:] = max_drifts_per_id[1:] - max_drifts_per_id[:-1]

Here from the previous result we get an array of end values of ranges:

array([3, 2, 6])

multipliers = multirange(max_drifts_per_id)

We use multirange as an efficient alternative to concatenating calls of np.arange. See How to efficiently concatenate many arange calls in numpy? for details. Resulting multipliers will be:

array([0, 1, 2, 0, 1, 0, 1, 2, 3, 4, 5])

drifts = np.tile(drift, (len(ids), 1))

By np.tile we expand the drift to have the same number of rows as ids:

array([[1, 1, 0],
       [1, 1, 0],
       [1, 1, 0],
       [1, 1, 0],
       [1, 1, 0],
       [1, 1, 0],
       [1, 1, 0],
       [1, 1, 0],
       [1, 1, 0],
       [1, 1, 0],
       [1, 1, 0]])

return drifts * multipliers[:, np.newaxis]

We multiply it by multipliers and get:

array([[0, 0, 0],
       [1, 1, 0],
       [2, 2, 0],
       [0, 0, 0],
       [1, 1, 0],
       [0, 0, 0],
       [1, 1, 0],
       [2, 2, 0],
       [3, 3, 0],
       [4, 4, 0],
       [5, 5, 0]])

And finally this returned value can be added to the original array:

a[:, :-1] += drifts(a[:, -1], drift)

czr · Accepted Answer · 2018-04-03 13:57:55Z

0

There is no builtin way of doing this as far as I know, but you can solve it with this simple loop:

import numpy as np
drift=np.array([1,1,0])
a = np.array([[1,1,1,0],[1,1,1,0],[1,1,1,0],
[1,1,1,2],[1,1,1,2],[1,1,1,3],[1,1,1,3],[1,1,1,3],[1,1,1,3],[1,1,1,3],[1,1,1,3]])

_id = 0
n = 0
for i in range(a.shape[0]):
    if a[i, 3] == _id:
        a[i, 0:3] = a[i, 0:3] + n * drift
        n += 1
    else:
        _id = a[i, 3]
        n = 1

print(a)

answered Apr 3, 2018 at 13:57

czr

6583 silver badges14 bronze badges

Collectives™ on Stack Overflow

Accumulate rows of NumPy array based on the last column

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related