2

I have a numpy array of various one hot encoded numpy arrays, eg;

x = np.array([[1, 0, 0], [0, 0, 1], [1, 0, 0]])

I would like to count the occurances of each unique one hot vector,

{[1, 0, 0]: 2, [0, 0, 1]: 1}
2
  • 1
    What have you tried? Stack Overflow generally frowns upon questions where one hasn't shown an attempt to solve their own problem. Commented Jul 18, 2017 at 20:26
  • 2
    list is unhashable, you cannot use it as a key in a dictionary. Commented Jul 18, 2017 at 20:36

5 Answers 5

11

Approach #1

Seems like a perfect setup to use the new functionality of numpy.unique (v1.13 and newer) that lets us work along an axis of a NumPy array -

unq_rows, count = np.unique(x,axis=0, return_counts=1)
out = {tuple(i):j for i,j in zip(unq_rows,count)}

Sample outputs -

In [289]: unq_rows
Out[289]: 
array([[0, 0, 1],
       [1, 0, 0]])

In [290]: count
Out[290]: array([1, 2])

In [291]: {tuple(i):j for i,j in zip(unq_rows,count)}
Out[291]: {(0, 0, 1): 1, (1, 0, 0): 2}

Approach #2

For NumPy versions older than v1.13, we can make use of the fact that the input array is one-hot encoded array, like so -

_, idx, count = np.unique(x.argmax(1), return_counts=1, return_index=1)
out = {tuple(i):j for i,j in zip(x[idx],count)} # x[idx] is unq_rows
Sign up to request clarification or add additional context in comments.

4 Comments

Note axis was added in numpy 1.13, so previous versions can't use this method..
Approach #2 might work better with the np.eye trick for one-hot arrays. u, count = np.unique(x.argmax(1), return_counts=1) ,i = np.eye(np.max(u)), out = {i[u]:j for i, j in zip(u, count)}. That way you don't need return_index or to index the big one-hot vector in your loop. Also hooray for np.unique(...,axis)!
And then we can go down to the answer by @TemporalWolf and realize we're wasting our time doing np.unique at all with one-hots when we could just be summing over the second axis.
@DanielF Thanks! Yes, would work with np.eye(np.max(u)+1). Also, summing along the first axis would work, just the additional work of masking out the zero counts and corresponding the counts to their respective rows if I got that correctly.
4

You could convert your arrays to tuples and use a Counter:

import numpy as np
from collections import Counter
x = np.array([[1, 0, 0], [0, 0, 1], [1, 0, 0]])
Counter([tuple(a) for a in x])
# Counter({(1, 0, 0): 2, (0, 0, 1): 1})

1 Comment

This is the only one that works with strings (or object type) in python 3.
3

The fastest way given your data format is:

x.sum(axis=0)

which gives:

array([2, 0, 1])

Where the 1st result is the count of arrays where the 1st is hot:

[1, 0, 0] [2
[0, 1, 0]  0
[0, 0, 1]  1]

This exploits the fact that only one can be on at a time, so we can decompose the direct sum.

If you absolutely need it expanded to the same format, it can be converted via:

sums = x.sum(axis=0)
{tuple(int(k == i) for k in range(len(sums))): e for i, e in enumerate(sums)}

or, similarly to tarashypka:

{tuple(row): count for row, count in zip(np.eye(len(sums), dtype=np.int64), sums)}

yields:

{(1, 0, 0): 2, (0, 1, 0): 0, (0, 0, 1): 1}

6 Comments

@EricDuminil Does that edit help? if not, I'll add more explanation... the index of the the sum is also the index of the hot.
Okay, got it. I didn't know what "one hot" means. Nice.
@EricDuminil I'm assuming the OP is talking about hot wires, but I honestly don't know.
@TemporalWolf Well just to clarify on the problem - In the given context, each row is all zeros except one element, called as one-hot encoded vector. The task is to count the number of identical vectors and assign them alongside their vectors. So, e.g. for x = np.array([[1, 0, 0, 0, 0], [0, 0, 0, 0, 1], [1, 0, 0, 0 , 0]]), we would have : {(0, 0, 0, 0, 1): 1, (1, 0, 0, 0, 0): 2}.
@Divakar This encodes the same information and can be expanded to that format if the OP requires it. I suspect he does not, that it's a mild XY Problem.
|
2

Here is another interesting solution with sum

>> {tuple(v): n for v, n in zip(np.eye(x.shape[1], dtype=int), np.sum(x, axis=0)) 
                if n > 0}
{(0, 0, 1): 1, (1, 0, 0): 2}

2 Comments

Nice. I think our answers are converging... x.sum(axis=0). Additionally, [1]*len(x[0]) makes it work on any size.
I like your version better than the one I did if it's necessary that the tuples be provided in a dict. np.diag() is perfect for this use.
1

Lists (including numpy arrays) are unhashable, i.e. they can't be keys of a dictionary. So your precise desired output, a dictionary with keys that look like [1, 0, 0] is never possible in Python. To deal with this you need to map your vectors to tuples.

from collections import Counter
import numpy as np

x = np.array([[1, 0, 0], [0, 0, 1], [1, 0, 0]])
counts = Counter(map(tuple, x))

That will get you:

In [12]: counts
Out[12]: Counter({(0, 0, 1): 1, (1, 0, 0): 2})

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.