Matlab: searching arrays for similar values and create a new array containing all values

Question

I have three column vectors:

A = [1;2;5;9;15]
B = [2;3;5;11;15]
C = [5;7;11;20;25]

I want to create a new column vector D by searching through all the elements of A B C, finding all the values and avoid repeating them in D.

I want D to be:

How to do this?
Thanks!

Possible duplicate of faster way to achieve unique() in matlab if assumed 1d pre-sorted vector? — GameOfThrows
– GameOfThrows, Commented Aug 15, 2016 at 10:06
Thanks. Is there any other way than using matlab built-in 'unique'? — user5916581
– user5916581, Commented Aug 15, 2016 at 11:02
Look at the suggested duplicate post - it contains a method which does not use unique — GameOfThrows
– GameOfThrows, Commented Aug 15, 2016 at 11:05

EBH · Accepted Answer · 2016-08-16 08:40:50Z

3

Here is another (super-fast) way, not using unique, and no loops, if you are dealing only with integers:

A = [1;2;5;9;15];
B = [2;3;5;11;15];
C = [5;7;11;20;25];
tmp = [A;B;C]; % concat the vectors
R = min(tmp):max(tmp)+1; % the range of the values
ind = histcounts(tmp,R)>0; % find all elements within tmp
D = R(ind).' % extract the relevant values

This method could be generalized for doubles:

A = [1.2;2.62;5.74;9.29;15.31];
B = [2.3;3;5;9.29;15.31];
C = [1.2;2.62;11;20;25];
tmp = sort([A;B;C]); % concat and sort the vectors
R = [tmp; max(tmp)+1]; % the range of the values
ind = histcounts(tmp,R)>0; % find all elements within tmp
D = tmp(ind) % extract the relevant values

However, the need to sort the values first (in tmp) makes it slower than the other methods.

edited Aug 16, 2016 at 8:40

answered Aug 15, 2016 at 11:28

EBH

10.4k3 gold badges38 silver badges61 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

EBH Over a year ago

@user5916581 You may find some other techniques here

patrik Over a year ago

This method seems to require integer values. However if this is wanted, then the method seems to have good performance.

EBH Over a year ago

@patrik I have added a generalized method, but it seems to outperform the other methods only with integers.

patrik Over a year ago

I would assume this has to do with the call to sort(). Normally sorting methods are heavy. In best cases you might come down to a O(n*log(n)) operation, but worst case is many times O(n^2).

deanshanahan · Accepted Answer · 2016-08-15 11:23:10Z

1

This code should do what you want:

% Your sample arrays
A=[1;2;5;9;15]
B=[2;3;5;11;15]
C=[5;7;11;20;25]

% [A,B,C] concatenates the arrays to one single array
% Unique finds unqiues values in the input array
[D, IA, ID] = unique([A,B,C]);

disp(D);

% D = array with unique values

% ID = array with unique natural number assigned to equal values for the
% original array

% IA = array that can be referenced against ID to find the value in the
% original array

% ID and IA can be used to recreate the original array

Solution without using "unique", this is probably less efficient:

% SOLUTION WITHOUT USING UNIQUE

% Your variables
A=[1;2;5;9;15];
B=[2;3;5;11;15];
C=[5;7;11;20;25];

% Allocate a temporary array with your arrays concatenated
temp = sort([A;B;C]);
rep_count = 0; % Count number of repeat values

% Allocate a blank array for your output
D = zeros(length(temp),1);
D(1) = temp(1); % Initialise first element (is always unique)

% Iterate through temp and output unqiue values to D
for i = 2:length(temp)
    if (temp(i) == D(i-1-rep_count))
        rep_count = rep_count+1;
    else
        D(i-rep_count) =  temp(i);
    end
end

% Remove zeros at the end of D
D = D(1:length(D)-rep_count);

disp(D)

edited Aug 15, 2016 at 11:23

answered Aug 15, 2016 at 10:57

deanshanahan

3205 silver badges15 bronze badges

2 Comments

user5916581 Over a year ago

Thanks. Is there any other way than using matlab built-in 'unique'?

deanshanahan Over a year ago

@user5916581 I edited my solution above with an alternative for you. It is probably slower than unique...

patrik · Accepted Answer · 2016-08-16 06:55:42Z

1

It is possible to sort the data and check the the unique values. This seems to be about as efficient as using the function unique(). Possibly with an advantage for using sort() and diff(). This may however be dependent on hardware and the difference is fairly insignificant, taking into account the simplicity of D = unique([A;B;C]);.

function test()

% A=[1;2;5;9;15];
% B=[2;3;5;11;15];
% C=[5;7;11;20;25];

A = 500*rand(10000000,1);
B= 500*rand(10000000,1);
C = 500*rand(10000000,1);

f1 = @() testA(A,B,C);
f2 = @() testB(A,B,C);

time1 = timeit(f1,1);
time2 = timeit(f2,1);
disp(time1);
disp(time2);

function D = testA(A,B,C)
d = sort([A;B;C]);
idx = diff(d);
D = d([1;idx]>0);

function D = testB(A,B,C)
D = unique([A;B;C]);

test

1.9085

1.9968

edited Aug 16, 2016 at 6:55

answered Aug 15, 2016 at 13:37

patrik

4,5586 gold badges31 silver badges55 bronze badges

1 Comment

EBH Over a year ago

I have tested this on my computer in compare to the use of histcounts (testC), the results: testA = 1.6110, testB = 1.5125, testC = 0.1835

Collectives™ on Stack Overflow

Matlab: searching arrays for similar values and create a new array containing all values

3 Answers 3

4 Comments

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related