Fastest way of finding and multiplying repeated values in array

Question

What is the fastest way of finding and multiplying repeated values between them in an array?

Example:

a = [ 2 2 3 5 11 11 17 ]

Result:

a = [ 4 3 5 121 17 ]

I can think of iterative ways (by finding the hist, iterating through bins, ...) , but is there a vectorized/fast approach?

As I just had a look at the other answers: What is the expected output for a = [2 2 3 3 2 2]? Divakar's answer yields [4,9,4], whereas thewaywewalk's answer yields [16,9]. — knedlsepp
– knedlsepp, Commented Mar 31, 2015 at 15:44
@knedlsepp To be honest, it wont happen. But well, chose the one you prefer. I would say 2nd option fits more in what I was doing, but it doesn't matter. Nice you noticed it though. — Ander Biguri
– Ander Biguri, Commented Mar 31, 2015 at 15:49

Community · Accepted Answer · 2017-05-23 12:22:15Z

6

Using histc and unique:

ua = unique(a)
out = ua.^histc(a,ua)

out =

     4     3     5   121    17

Considering the case, that the vector a is not monotonically increasing, it gets a little more complicated:

%// non monotonically increasing vector
a = [ 2 2 3 5 11 11 17 4 4 1 1 1 7 7]

[ua, ia] = unique(a)             %// get unique values and sort as required for histc  
[~, idx] = ismember(sort(ia),ia) %// get original order
hc = histc(a,ua)                 %// count occurences
prods = ua.^hc                   %// calculate products
out = prods(idx)                 %// reorder to original order

or:

ua = unique(a,'stable')          %// get unique values in original order
uas = unique(a)                  %// get unique values sorted as required for histc  
[~,idx] = ismember(ua,uas)       %// get indices of original order
hc = histc(a,uas)                %// count occurences
out = ua.^hc(idx)                %// calculate products and reorder

out =

     4     3     5   121    17    16     1    49

Seems still a good solution as accumarray also doesn't offer a stable version by default.

edited May 23, 2017 at 12:22

CommunityBot

11 silver badge

answered Mar 30, 2015 at 14:12

Robert Seifert

25.3k13 gold badges75 silver badges117 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Ander Biguri Over a year ago

OH, well, Divakar's answer is cool, but this one is short and clear! Very nice.

Ander Biguri Over a year ago

I am going to accept this answer for the length and clarity of the code, however both of them are really good.

Robert Seifert Over a year ago

@AnderBiguri Have a look at my edit, I don't know if it applies to your case.

Robert Seifert Over a year ago

@Divakar thanks for your advice! I misused sort.... I stole your ismember approach now. My upvote you already had anyway :)

knedlsepp Over a year ago

Oh. I guess the problem is the following: Multiple non-consecutive occurrences of the same integer: a = [2 2 3 3 2 2].

|

Community · Accepted Answer · 2020-06-20 09:12:55Z

6

Prospective approach and Solution Code

Seems like the posted problem would be a good fit for accumarray -

%// Starting indices of each "group"
start_ind = find(diff([0 ; a(:)]))

%// Setup IDs for each group
id = zeros(1,numel(a)) %// Or id(numel(a))=0 for faster pre-allocation
id(start_ind) = 1

%// Use accumarray to get the products of elements within the same group
out = accumarray(cumsum(id(:)),a(:),[],@prod)

For a non monotonically increasing input, you need to add two more lines of code -

[~,sorted_idx] = ismember(sort(start_ind),start_ind)
out = out(sorted_idx)

Sample run -

>> a
a =
     2     2     3     5    11    11    17     4     4     1     1     1     7     7
>> out.'
ans =
     4     3     5   121    17    16     1    49

Tweaky-Squeaky

Now, one can make use of logical indexing to remove find and also use the faster pre-allocation scheme to give the proposed approach a super-boost and give us a tweaked code -

id(numel(a))=0;
id([true ; diff(a(:))~=0])=1;
out = accumarray(cumsum(id(:)),a(:),[],@prod);

Benchmarking

Here's the benchmarking code that compares all the proposed approaches posted thus far for the stated problem for runtimes -

%// Setup huge random input array
maxn = 10000;
N = 100000000;
a = sort(randi(maxn,1,N));

%// Warm up tic/toc.
for k = 1:100000
    tic(); elapsed = toc();
end

disp('------------------------- With UNIQUE')
tic
ua = unique(a);
out = ua.^histc(a,ua);
toc, clear ua out

disp('------------------------- With ACCUMARRAY')
tic
id(numel(a))=0;
id([true ; diff(a(:))~=0])=1;
out = accumarray(cumsum(id(:)),a(:),[],@prod);
toc, clear out id

disp('------------------------- With FOR-LOOP')
tic
b = a(1);
for k = 2:numel(a)
    if a(k)==a(k-1)
        b(end) = b(end)*a(k);
    else
        b(end+1) = a(k);
    end
end
toc

Runtimes

------------------------- With UNIQUE
Elapsed time is 3.050523 seconds.
------------------------- With ACCUMARRAY
Elapsed time is 1.710499 seconds.
------------------------- With FOR-LOOP
Elapsed time is 1.811323 seconds.

Conclusions: The runtimes it seems, support the idea of accumarray over the two other approaches!

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Mar 30, 2015 at 14:08

Divakar

222k19 gold badges273 silver badges374 bronze badges

7 Comments

Ander Biguri Over a year ago

I was expecting a nice answer like this from you Divakar!

Divakar Over a year ago

@AnderBiguri Ah me too! ;)

knedlsepp Over a year ago

@Divakar: What version of MATLAB are you running? Copy pasting your code I get entirely different results. (Macbook Pro 4GB RAM, R2014a): unique: 12.3 seconds, accumarray: 9.8 seconds, for: 2.9 seconds.

knedlsepp Over a year ago

It seems N=1e8 is too much memory for my machine for accumarray. N=1e7 results are more similar, yet still the for loop beats accumarray here: unique: 0.94s, accumarray: 0.31s, for: 0.26s.

Divakar Over a year ago

@knedlsepp I am on 2015A, 16GB RAM. Make it N = 1000000 and maxn = 100? The idea is to use such datasizes for which tic-toc results in more than 1 sec runtimes, that lessens the effect of variations that come up with using tic-toc.

|

knedlsepp · Accepted Answer · 2015-03-31 19:00:29Z

3

You might be surprised how well a simple for-loop compares in terms of speed:

b = a(1);
for k = 2:numel(a)
    if a(k)==a(k-1)
        b(end) = b(end)*a(k);
    else
        b(end+1) = a(k);
    end
end

Even without doing any preallocation this performs on par with the accumarray solution.

edited Mar 31, 2015 at 19:00

answered Mar 31, 2015 at 15:22

knedlsepp

6,1043 gold badges22 silver badges43 bronze badges

1 Comment

Divakar Over a year ago

Added few tweaks and benchmarks for more surprises :) Good alternative solution though!

Collectives™ on Stack Overflow

Fastest way of finding and multiplying repeated values in array

3 Answers 3

10 Comments

Prospective approach and Solution Code

Tweaky-Squeaky

Benchmarking

7 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

10 Comments

Prospective approach and Solution Code

Tweaky-Squeaky

Benchmarking

7 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related