count occurrences of arrays in multidimensional arrays in python

Question

I have the following type of arrays:

a = array([[1,1,1],
           [1,1,1],
           [1,1,1],
           [2,2,2],
           [2,2,2],
           [2,2,2],
           [3,3,0],
           [3,3,0],
           [3,3,0]])

I would like to count the number of occurrences of each type of array such as

[1,1,1]:3, [2,2,2]:3, and [3,3,0]: 3

How could I achieve this in python? Is it possible without using a for loop and counting into a dictionary? It has to be fast and should take less than 0.1 seconds or so. I looked into Counter, numpy bincount, etc. But, those are for individual element not for an array.

Thanks.

Possible duplicate of Find unique rows in numpy.array

Daniel
– Daniel

2015-10-20 16:37:25 +00:00
Commented Oct 20, 2015 at 16:37 — Daniel
– Daniel, Commented Oct 20, 2015 at 16:37

Padraic Cunningham · Accepted Answer · 2015-10-20 11:24:58Z

2

If you don't mind mapping to tuples just to get the count you can use a Counter dict which runs in 28.5 µs on my machine using python3 which is well below your threshold:

In [5]: timeit Counter(map(tuple, a))
10000 loops, best of 3: 28.5 µs per loop

In [6]: c = Counter(map(tuple, a))

In [7]: c
Out[7]: Counter({(2, 2, 2): 3, (1, 1, 1): 3, (3, 3, 0): 3})

answered Oct 20, 2015 at 11:24

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Padraic Cunningham Over a year ago

No prob, it is actually faster again in python2 just using map

Community · Accepted Answer · 2017-05-23 12:19:26Z

2

collections.Counter can do this conveniently, and almost like the example given.

>>> from collections import Counter
>>> c = Counter()
>>> for x in a:
...   c[tuple(x)] += 1
...
>>> c
Counter({(2, 2, 2): 3, (1, 1, 1): 3, (3, 3, 0): 3})

This converts each sub-list to a tuple, which can be keys in a dictionary since they are immutable. Lists are mutable so can't be used as dict keys.

Why do you want to avoid using for loops?

And similar to @padraic-cunningham's much cooler answer:

>>> Counter(tuple(x) for x in a)
Counter({(2, 2, 2): 3, (1, 1, 1): 3, (3, 3, 0): 3})
>>> Counter(map(tuple, a))
Counter({(2, 2, 2): 3, (1, 1, 1): 3, (3, 3, 0): 3})

edited May 23, 2017 at 12:19

CommunityBot

11 silver badge

answered Oct 20, 2015 at 11:31

aneroid

16.7k3 gold badges42 silver badges77 bronze badges

6 Comments

Daniel Over a year ago

Python loops with numpy should be avoided as they can be many times slower than a numpy solution. Therefore, if you are using numpy and python loops you are likely "doing it wrong". In both the python answers you are copying the entire array to a much less compact datatype which is especially worrying.

Padraic Cunningham Over a year ago

@Ophion. do you have a faster numpy solution?

Daniel Over a year ago

@PadraicCunningham Divakar's solution will be faster for reasonably sized problems. This problem has been asked, answered, and benchmarked many times in the past. I think the canonical answer is here.

aneroid Over a year ago

@Ophion Re: "copying the entire array to a much less compact datatype". That's a good point. Though, I'd thought of taking it one further level of worrying by using a hash. But then also figured it to be unnecessarily complicated. One edit/addition I was going to put in the answer was retrieving the count of a certain elem, which is essentially a dict lookup in this case. So to print (for example) one of the counts, instead of print c[(3, 3, 0)], one would put print c.getcounts(3, 3, 0) (or override __getitem__ with the code for getcounts) and achieve a similar result.

Divakar Over a year ago

Added benchmark for a decently large input array case in my solution. That benchmarking trend continued for medium sized inputs as well. I think @Ophion made sense when he talked about NumPy solutions being optimized for such cases that goes with its philosophy.

|

Divakar · Accepted Answer · 2015-10-20 20:41:07Z

2

You could convert those rows to a 1D array using the elements as two-dimensional indices with np.ravel_multi_index. Then, use np.unique to give us the positions of the start of each unique row and also has an optional argument return_counts to give us the counts. Thus, the implementation would look something like this -

def unique_rows_counts(a):

    # Calculate linear indices using rows from a
    lidx = np.ravel_multi_index(a.T,a.max(0)+1 )

    # Get the unique indices and their counts
    _, unq_idx, counts = np.unique(lidx, return_index = True, return_counts=True)

    # return the unique groups from a and their respective counts
    return a[unq_idx], counts

Sample run -

In [64]: a
Out[64]: 
array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1],
       [2, 2, 2],
       [2, 2, 2],
       [2, 2, 2],
       [3, 3, 0],
       [3, 3, 0],
       [3, 3, 0]])

In [65]: unqrows, counts = unique_rows_counts(a)

In [66]: unqrows
Out[66]: 
array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 0]])
In [67]: counts
Out[67]: array([3, 3, 3])

Benchmarking

Assuming you are okay with either numpy arrays or collections as outputs, one can benchmark the solutions provided thus far, like so -

Function definitions:

import numpy as np
from collections import Counter

def unique_rows_counts(a):
    lidx = np.ravel_multi_index(a.T,a.max(0)+1 )
    _, unq_idx, counts = np.unique(lidx, return_index = True, return_counts=True)
    return a[unq_idx], counts

def map_Counter(a):
    return Counter(map(tuple, a))    

def forloop_Counter(a):      
    c = Counter()
    for x in a:
        c[tuple(x)] += 1
    return c

Timings:

In [53]: a = np.random.randint(0,4,(10000,5))

In [54]: %timeit map_Counter(a)
10 loops, best of 3: 31.7 ms per loop

In [55]: %timeit forloop_Counter(a)
10 loops, best of 3: 45.4 ms per loop

In [56]: %timeit unique_rows_counts(a)
1000 loops, best of 3: 1.72 ms per loop

edited Oct 20, 2015 at 20:41

answered Oct 20, 2015 at 11:40

Divakar

222k19 gold badges273 silver badges374 bronze badges

19 Comments

Divakar Over a year ago

Amazingly all the answers including the question got downvotes, makes one wonder!

Divakar Over a year ago

@PadraicCunningham I could add runtime tests, but then OP might be looking to have a dict as output, so that won't be fair :)

Padraic Cunningham Over a year ago

I will have to wait for the downvoter to release their superior solution so ;)

Divakar Over a year ago

@PadraicCunningham Not tested for correctness, but something like - base = a.min(0); lidx = np.ravel_multi_index(a.T- base.T[:,None],a.max(0)-base+1 ).

Divakar Over a year ago

@PadraicCunningham Ah I see, well good to know it's still being used!

|

Eelco Hoogendoorn · Accepted Answer · 2016-10-05 06:24:14Z

1

The numpy_indexed package (disclaimer: I am its author) contains efficient vectorized functionality for these kind of operations:

import numpy_indexed as npi
unique_rows, row_count = npi.count(a, axis=0)

Note that this works for arrays of any dimensionality or datatype.

answered Oct 5, 2016 at 6:24

Eelco Hoogendoorn

10.8k1 gold badge46 silver badges43 bronze badges

2 Comments

Moustafa Mahmoud Over a year ago

Perfect answer but how can we make it as one output: example [[[1,1,9],10],[[1,1,0],2]]

Eelco Hoogendoorn Over a year ago

zip(*npi.count(..) would give that; but that wouldnt be very numpythonic; alternatively you could make a structured array with a composite dtype and assign the result to that, if you insist. But more likely that not you will end up wit ha more efficient solution if you stick to the way numpy likes to organise things natively.

boyangeor · Accepted Answer · 2017-10-19 21:05:33Z

1

Since numpy-1.13.0, np.unique can be used with axis argument:

>>> np.unique(a, axis=0, return_counts=True)

(array([[1, 1, 1],
        [2, 2, 2],
        [3, 3, 0]]), array([3, 3, 3]))

answered Oct 19, 2017 at 21:05

boyangeor

1,2061 gold badge9 silver badges10 bronze badges

Collectives™ on Stack Overflow

count occurrences of arrays in multidimensional arrays in python

5 Answers 5

1 Comment

6 Comments

Benchmarking

19 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

6 Comments

Benchmarking

19 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related