Count occurrences of unique arrays in array

Question

I have a numpy array of various one hot encoded numpy arrays, eg;

x = np.array([[1, 0, 0], [0, 0, 1], [1, 0, 0]])

I would like to count the occurances of each unique one hot vector,

{[1, 0, 0]: 2, [0, 0, 1]: 1}

What have you tried? Stack Overflow generally frowns upon questions where one hasn't shown an attempt to solve their own problem. — TemporalWolf
– TemporalWolf, Commented Jul 18, 2017 at 20:26
list is unhashable, you cannot use it as a key in a dictionary. — Mr Tarsa
– Mr Tarsa, Commented Jul 18, 2017 at 20:36

Divakar · Accepted Answer · 2017-07-18 20:51:39Z

11

Approach #1

Seems like a perfect setup to use the new functionality of numpy.unique (v1.13 and newer) that lets us work along an axis of a NumPy array -

unq_rows, count = np.unique(x,axis=0, return_counts=1)
out = {tuple(i):j for i,j in zip(unq_rows,count)}

Sample outputs -

In [289]: unq_rows
Out[289]: 
array([[0, 0, 1],
       [1, 0, 0]])

In [290]: count
Out[290]: array([1, 2])

In [291]: {tuple(i):j for i,j in zip(unq_rows,count)}
Out[291]: {(0, 0, 1): 1, (1, 0, 0): 2}

Approach #2

For NumPy versions older than v1.13, we can make use of the fact that the input array is one-hot encoded array, like so -

_, idx, count = np.unique(x.argmax(1), return_counts=1, return_index=1)
out = {tuple(i):j for i,j in zip(x[idx],count)} # x[idx] is unq_rows

edited Jul 18, 2017 at 20:51

answered Jul 18, 2017 at 20:32

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

TemporalWolf Over a year ago

Note axis was added in numpy 1.13, so previous versions can't use this method..

Daniel F Over a year ago

Approach #2 might work better with the np.eye trick for one-hot arrays. u, count = np.unique(x.argmax(1), return_counts=1) ,i = np.eye(np.max(u)), out = {i[u]:j for i, j in zip(u, count)}. That way you don't need return_index or to index the big one-hot vector in your loop. Also hooray for np.unique(...,axis)!

Daniel F Over a year ago

And then we can go down to the answer by @TemporalWolf and realize we're wasting our time doing np.unique at all with one-hots when we could just be summing over the second axis.

Divakar Over a year ago

@DanielF Thanks! Yes, would work with np.eye(np.max(u)+1). Also, summing along the first axis would work, just the additional work of masking out the zero counts and corresponding the counts to their respective rows if I got that correctly.

Eric Duminil · Accepted Answer · 2017-07-18 20:29:38Z

4

You could convert your arrays to tuples and use a Counter:

import numpy as np
from collections import Counter
x = np.array([[1, 0, 0], [0, 0, 1], [1, 0, 0]])
Counter([tuple(a) for a in x])
# Counter({(1, 0, 0): 2, (0, 0, 1): 1})

answered Jul 18, 2017 at 20:29

Eric Duminil

54.6k10 gold badges80 silver badges134 bronze badges

1 Comment

Sapiens Over a year ago

This is the only one that works with strings (or object type) in python 3.

TemporalWolf · Accepted Answer · 2017-07-18 22:19:47Z

3

The fastest way given your data format is:

x.sum(axis=0)

which gives:

array([2, 0, 1])

Where the 1st result is the count of arrays where the 1st is hot:

[1, 0, 0] [2
[0, 1, 0]  0
[0, 0, 1]  1]

This exploits the fact that only one can be on at a time, so we can decompose the direct sum.

If you absolutely need it expanded to the same format, it can be converted via:

sums = x.sum(axis=0)
{tuple(int(k == i) for k in range(len(sums))): e for i, e in enumerate(sums)}

or, similarly to tarashypka:

{tuple(row): count for row, count in zip(np.eye(len(sums), dtype=np.int64), sums)}

yields:

{(1, 0, 0): 2, (0, 1, 0): 0, (0, 0, 1): 1}

edited Jul 18, 2017 at 22:19

answered Jul 18, 2017 at 20:46

TemporalWolf

8,0121 gold badge33 silver badges54 bronze badges

6 Comments

TemporalWolf Over a year ago

@EricDuminil Does that edit help? if not, I'll add more explanation... the index of the the sum is also the index of the hot.

Eric Duminil Over a year ago

Okay, got it. I didn't know what "one hot" means. Nice.

TemporalWolf Over a year ago

@EricDuminil I'm assuming the OP is talking about hot wires, but I honestly don't know.

Divakar Over a year ago

@TemporalWolf Well just to clarify on the problem - In the given context, each row is all zeros except one element, called as one-hot encoded vector. The task is to count the number of identical vectors and assign them alongside their vectors. So, e.g. for x = np.array([[1, 0, 0, 0, 0], [0, 0, 0, 0, 1], [1, 0, 0, 0 , 0]]), we would have : {(0, 0, 0, 0, 1): 1, (1, 0, 0, 0, 0): 2}.

TemporalWolf Over a year ago

@Divakar This encodes the same information and can be expanded to that format if the OP requires it. I suspect he does not, that it's a mild XY Problem.

|

Mr Tarsa · Accepted Answer · 2017-07-18 22:10:09Z

2

Here is another interesting solution with sum

>> {tuple(v): n for v, n in zip(np.eye(x.shape[1], dtype=int), np.sum(x, axis=0)) 
                if n > 0}
{(0, 0, 1): 1, (1, 0, 0): 2}

edited Jul 18, 2017 at 22:10

answered Jul 18, 2017 at 20:42

Mr Tarsa

6,6613 gold badges29 silver badges45 bronze badges

2 Comments

TemporalWolf Over a year ago

Nice. I think our answers are converging... x.sum(axis=0). Additionally, [1]*len(x[0]) makes it work on any size.

TemporalWolf Over a year ago

I like your version better than the one I did if it's necessary that the tuples be provided in a dict. np.diag() is perfect for this use.

Marc · Accepted Answer · 2017-07-18 21:46:02Z

1

Lists (including numpy arrays) are unhashable, i.e. they can't be keys of a dictionary. So your precise desired output, a dictionary with keys that look like [1, 0, 0] is never possible in Python. To deal with this you need to map your vectors to tuples.

from collections import Counter
import numpy as np

x = np.array([[1, 0, 0], [0, 0, 1], [1, 0, 0]])
counts = Counter(map(tuple, x))

That will get you:

In [12]: counts
Out[12]: Counter({(0, 0, 1): 1, (1, 0, 0): 2})

edited Jul 18, 2017 at 21:46

answered Jul 18, 2017 at 20:35

Marc

563 bronze badges

Collectives™ on Stack Overflow

Count occurrences of unique arrays in array

5 Answers 5

4 Comments

1 Comment

6 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

4 Comments

1 Comment

6 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related