3

What is the fastest way of finding and multiplying repeated values between them in an array?

Example:

a = [ 2 2 3 5 11 11 17 ]

Result:

a = [ 4 3 5 121 17 ]

I can think of iterative ways (by finding the hist, iterating through bins, ...) , but is there a vectorized/fast approach?

2
  • As I just had a look at the other answers: What is the expected output for a = [2 2 3 3 2 2]? Divakar's answer yields [4,9,4], whereas thewaywewalk's answer yields [16,9]. Commented Mar 31, 2015 at 15:44
  • 1
    @knedlsepp To be honest, it wont happen. But well, chose the one you prefer. I would say 2nd option fits more in what I was doing, but it doesn't matter. Nice you noticed it though. Commented Mar 31, 2015 at 15:49

3 Answers 3

6

Using histc and unique:

ua = unique(a)
out = ua.^histc(a,ua)

out =

     4     3     5   121    17

Considering the case, that the vector a is not monotonically increasing, it gets a little more complicated:

%// non monotonically increasing vector
a = [ 2 2 3 5 11 11 17 4 4 1 1 1 7 7]

[ua, ia] = unique(a)             %// get unique values and sort as required for histc  
[~, idx] = ismember(sort(ia),ia) %// get original order
hc = histc(a,ua)                 %// count occurences
prods = ua.^hc                   %// calculate products
out = prods(idx)                 %// reorder to original order

or:

ua = unique(a,'stable')          %// get unique values in original order
uas = unique(a)                  %// get unique values sorted as required for histc  
[~,idx] = ismember(ua,uas)       %// get indices of original order
hc = histc(a,uas)                %// count occurences
out = ua.^hc(idx)                %// calculate products and reorder 

out =

     4     3     5   121    17    16     1    49

Seems still a good solution as accumarray also doesn't offer a stable version by default.

Sign up to request clarification or add additional context in comments.

10 Comments

OH, well, Divakar's answer is cool, but this one is short and clear! Very nice.
I am going to accept this answer for the length and clarity of the code, however both of them are really good.
@AnderBiguri Have a look at my edit, I don't know if it applies to your case.
@Divakar thanks for your advice! I misused sort.... I stole your ismember approach now. My upvote you already had anyway :)
Oh. I guess the problem is the following: Multiple non-consecutive occurrences of the same integer: a = [2 2 3 3 2 2].
|
6

Prospective approach and Solution Code

Seems like the posted problem would be a good fit for accumarray -

%// Starting indices of each "group"
start_ind = find(diff([0 ; a(:)]))

%// Setup IDs for each group
id = zeros(1,numel(a)) %// Or id(numel(a))=0 for faster pre-allocation
id(start_ind) = 1

%// Use accumarray to get the products of elements within the same group
out = accumarray(cumsum(id(:)),a(:),[],@prod)

For a non monotonically increasing input, you need to add two more lines of code -

[~,sorted_idx] = ismember(sort(start_ind),start_ind)
out = out(sorted_idx)

Sample run -

>> a
a =
     2     2     3     5    11    11    17     4     4     1     1     1     7     7
>> out.'
ans =
     4     3     5   121    17    16     1    49

Tweaky-Squeaky

Now, one can make use of logical indexing to remove find and also use the faster pre-allocation scheme to give the proposed approach a super-boost and give us a tweaked code -

id(numel(a))=0;
id([true ; diff(a(:))~=0])=1;
out = accumarray(cumsum(id(:)),a(:),[],@prod);

Benchmarking

Here's the benchmarking code that compares all the proposed approaches posted thus far for the stated problem for runtimes -

%// Setup huge random input array
maxn = 10000;
N = 100000000;
a = sort(randi(maxn,1,N));

%// Warm up tic/toc.
for k = 1:100000
    tic(); elapsed = toc();
end

disp('------------------------- With UNIQUE')
tic
ua = unique(a);
out = ua.^histc(a,ua);
toc, clear ua out

disp('------------------------- With ACCUMARRAY')
tic
id(numel(a))=0;
id([true ; diff(a(:))~=0])=1;
out = accumarray(cumsum(id(:)),a(:),[],@prod);
toc, clear out id

disp('------------------------- With FOR-LOOP')
tic
b = a(1);
for k = 2:numel(a)
    if a(k)==a(k-1)
        b(end) = b(end)*a(k);
    else
        b(end+1) = a(k);
    end
end
toc

Runtimes

------------------------- With UNIQUE
Elapsed time is 3.050523 seconds.
------------------------- With ACCUMARRAY
Elapsed time is 1.710499 seconds.
------------------------- With FOR-LOOP
Elapsed time is 1.811323 seconds.

Conclusions: The runtimes it seems, support the idea of accumarray over the two other approaches!

7 Comments

I was expecting a nice answer like this from you Divakar!
@AnderBiguri Ah me too! ;)
@Divakar: What version of MATLAB are you running? Copy pasting your code I get entirely different results. (Macbook Pro 4GB RAM, R2014a): unique: 12.3 seconds, accumarray: 9.8 seconds, for: 2.9 seconds.
It seems N=1e8 is too much memory for my machine for accumarray. N=1e7 results are more similar, yet still the for loop beats accumarray here: unique: 0.94s, accumarray: 0.31s, for: 0.26s.
@knedlsepp I am on 2015A, 16GB RAM. Make it N = 1000000 and maxn = 100? The idea is to use such datasizes for which tic-toc results in more than 1 sec runtimes, that lessens the effect of variations that come up with using tic-toc.
|
3

You might be surprised how well a simple for-loop compares in terms of speed:

b = a(1);
for k = 2:numel(a)
    if a(k)==a(k-1)
        b(end) = b(end)*a(k);
    else
        b(end+1) = a(k);
    end
end

Even without doing any preallocation this performs on par with the accumarray solution.

1 Comment

Added few tweaks and benchmarks for more surprises :) Good alternative solution though!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.