Merging records in a Numpy structured array

Question

I have a Numpy structured array that is sorted by the first column:

x = array([(2, 3), (2, 8), (4, 1)], dtype=[('recod', '<u8'), ('count', '<u4')])

I need to merge records (sum the values of the second column) where

x[n][0] == x[n + 1][0]

In this case, the desired output would be:

x = array([(2, 11), (4, 1)], dtype=[('recod', '<u8'), ('count', '<u4')])

What's the best way to achieve this?

Please edit this question to reflect the structured array you gave in a comment: array([(2, 3), (2, 8), (4, 1)], dtype=[('recod', '<u8'), ('count', '<u4')]). You existing question looks more like a 2d array. — hpaulj
– hpaulj, Commented Aug 14, 2015 at 16:10

Divakar · Accepted Answer · 2015-08-14 11:27:44Z

3

You can use np.unique to get an ID array for each element in the first column and then use np.bincount to perform accumulation on the second column elements based on the IDs -

In [140]: A
Out[140]: 
array([[25,  1],
       [37,  3],
       [37,  2],
       [47,  1],
       [59,  2]])

In [141]: unqA,idx = np.unique(A[:,0],return_inverse=True)

In [142]: np.column_stack((unqA,np.bincount(idx,A[:,1])))
Out[142]: 
array([[ 25.,   1.],
       [ 37.,   5.],
       [ 47.,   1.],
       [ 59.,   2.]])

You can avoid np.unique with a combination of np.diff and np.cumsum which might help because np.unique also does sorting internally, which is not needed in this case as the input data is already sorted. The implementation would look something like this -

In [201]: A
Out[201]: 
array([[25,  1],
       [37,  3],
       [37,  2],
       [47,  1],
       [59,  2]])

In [202]: unq1 = np.append(True,np.diff(A[:,0])!=0)

In [203]: np.column_stack((A[:,0][unq1],np.bincount(unq1.cumsum()-1,A[:,1])))
Out[203]: 
array([[ 25.,   1.],
       [ 37.,   5.],
       [ 47.,   1.],
       [ 59.,   2.]])

edited Aug 14, 2015 at 11:27

answered Aug 14, 2015 at 11:00

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

krlk89 Over a year ago

I am getting the following error:

Traceback (most recent call last): File "/home/krlk89/abc.py", line 8, in <module> unq1 = np.append(True,np.diff(x[:,0])!=0) IndexError: too many indices

Divakar Over a year ago

@krlk89 What's the shape of x? x.shape output?

krlk89 Over a year ago

>>> x array([(2, 3), (2, 8), (4, 1)], dtype=[('recod', '<u8'), ('count', '<u4')]) >>> x.shape (3,)

Divakar Over a year ago

@krlk89 Do something like A = np.column_stack((x['recod'],x['count'])) and then use the solutions, assuming A as the input to the solutions? Or in the solution codes, use x['recod'] in place of A[:,0] and x['count'] to replace A[:,1]

hpaulj Over a year ago

I added an answer based off this one, adapted to structured arrays.

|

hpaulj · Accepted Answer · 2015-08-16 19:29:48Z

2

Dicakar's answer cast in structured array form:

In [500]: x=np.array([(25, 1), (37, 3), (37, 2), (47, 1), (59, 2)], dtype=[('recod', '<u8'), ('count', '<u4')])

Find unique values and count duplicates:

In [501]: unqA, idx=np.unique(x['recod'], return_inverse=True)    
In [502]: cnt = np.bincount(idx, x['count'])

Make a new structured array and fill the fields:

In [503]: x1 = np.empty(unqA.shape, dtype=x.dtype)
In [504]: x1['recod'] = unqA
In [505]: x1['count'] = cnt

In [506]: x1
Out[506]: 
array([(25, 1), (37, 5), (47, 1), (59, 2)], 
      dtype=[('recod', '<u8'), ('count', '<u4')])

There is a recarray function that builds an array from a list of arrays:

In [507]: np.rec.fromarrays([unqA,cnt],dtype=x.dtype)
Out[507]: 
rec.array([(25, 1), (37, 5), (47, 1), (59, 2)], 
      dtype=[('recod', '<u8'), ('count', '<u4')])

Internally it does the same thing - build an empty array of the right size and dtype, and then loop over over the dtype fields. A recarray is just a structured array in a specialized array subclass wrapper.

There are two ways of populating a structured array (especially with a diverse dtype) - with a list of tuples as you did with x, and field by field.

edited Aug 16, 2015 at 19:29

answered Aug 16, 2015 at 19:23

hpaulj

233k14 gold badges260 silver badges392 bronze badges

2 Comments

krlk89 Over a year ago

Thanks a lot for your help!

Divakar Over a year ago

Thanks for helping out OP on this! I wasn't really familiar with the structured arrays thing.

Warren Weckesser · Accepted Answer · 2015-08-14 14:06:15Z

2

pandas makes this type of "group-by" operation trivial:

In [285]: import pandas as pd

In [286]: x = [(25, 1), (37, 3), (37, 2), (47, 1), (59, 2)]

In [287]: df = pd.DataFrame(x)

In [288]: df
Out[288]: 
    0  1
0  25  1
1  37  3
2  37  2
3  47  1
4  59  2

In [289]: df.groupby(0).sum()
Out[289]: 
    1
0    
25  1
37  5
47  1
59  2

You probably won't want the dependency on pandas if this is the only operation you need from it, but once you get started, you might find other useful bits in the library.

edited Aug 14, 2015 at 14:06

answered Aug 14, 2015 at 14:01

Warren Weckesser

116k20 gold badges207 silver badges224 bronze badges

3 Comments

krlk89 Over a year ago

Thanks for your help! I tried this and got an error message: pastebin.com/mA6fDT3u

Warren Weckesser Over a year ago

I see you changed the format of your array. In that case, use df.groupby('recod').sum().

krlk89 Over a year ago

Thanks! It works now, but how can I get back the structure of my initial array?

behzad.nouri · Accepted Answer · 2015-08-14 10:59:41Z

1

You can use np.reduceat. You just need to populate where x[:, 0] changes which is equivalent to non zero indices of np.diff(x[:,0]) shifted by one plus the initial index 0:

>>> i = np.r_[0, 1 + np.nonzero(np.diff(x[:,0]))[0]]
>>> a, b = x[i, 0], np.add.reduceat(x[:, 1], i)
>>> np.vstack((a, b)).T
array([[25,  1],
       [37,  5],
       [47,  1],
       [59,  2]])

answered Aug 14, 2015 at 10:59

behzad.nouri

78.5k18 gold badges130 silver badges127 bronze badges

Collectives™ on Stack Overflow

Merging records in a Numpy structured array

4 Answers 4

8 Comments

2 Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

8 Comments

2 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related