1

I have three column vectors:

A = [1;2;5;9;15]
B = [2;3;5;11;15]
C = [5;7;11;20;25]

I want to create a new column vector D by searching through all the elements of A B C, finding all the values and avoid repeating them in D.

I want D to be:

D = 
    1
    2
    3
    5
    7
    9
    11
    15
    20
    25

How to do this?
Thanks!

4
  • 5
    unique([A;B;C])? Commented Aug 15, 2016 at 10:02
  • Possible duplicate of faster way to achieve unique() in matlab if assumed 1d pre-sorted vector? Commented Aug 15, 2016 at 10:06
  • Thanks. Is there any other way than using matlab built-in 'unique'? Commented Aug 15, 2016 at 11:02
  • Look at the suggested duplicate post - it contains a method which does not use unique Commented Aug 15, 2016 at 11:05

3 Answers 3

3

Here is another (super-fast) way, not using unique, and no loops, if you are dealing only with integers:

A = [1;2;5;9;15];
B = [2;3;5;11;15];
C = [5;7;11;20;25];
tmp = [A;B;C]; % concat the vectors
R = min(tmp):max(tmp)+1; % the range of the values
ind = histcounts(tmp,R)>0; % find all elements within tmp
D = R(ind).' % extract the relevant values

This method could be generalized for doubles:

A = [1.2;2.62;5.74;9.29;15.31];
B = [2.3;3;5;9.29;15.31];
C = [1.2;2.62;11;20;25];
tmp = sort([A;B;C]); % concat and sort the vectors
R = [tmp; max(tmp)+1]; % the range of the values
ind = histcounts(tmp,R)>0; % find all elements within tmp
D = tmp(ind) % extract the relevant values

However, the need to sort the values first (in tmp) makes it slower than the other methods.

Sign up to request clarification or add additional context in comments.

4 Comments

@user5916581 You may find some other techniques here
This method seems to require integer values. However if this is wanted, then the method seems to have good performance.
@patrik I have added a generalized method, but it seems to outperform the other methods only with integers.
I would assume this has to do with the call to sort(). Normally sorting methods are heavy. In best cases you might come down to a O(n*log(n)) operation, but worst case is many times O(n^2).
1

This code should do what you want:

% Your sample arrays
A=[1;2;5;9;15]
B=[2;3;5;11;15]
C=[5;7;11;20;25]

% [A,B,C] concatenates the arrays to one single array
% Unique finds unqiues values in the input array
[D, IA, ID] = unique([A,B,C]);

disp(D);

% D = array with unique values

% ID = array with unique natural number assigned to equal values for the
% original array

% IA = array that can be referenced against ID to find the value in the
% original array

% ID and IA can be used to recreate the original array

Solution without using "unique", this is probably less efficient:

% SOLUTION WITHOUT USING UNIQUE

% Your variables
A=[1;2;5;9;15];
B=[2;3;5;11;15];
C=[5;7;11;20;25];

% Allocate a temporary array with your arrays concatenated
temp = sort([A;B;C]);
rep_count = 0; % Count number of repeat values

% Allocate a blank array for your output
D = zeros(length(temp),1);
D(1) = temp(1); % Initialise first element (is always unique)

% Iterate through temp and output unqiue values to D
for i = 2:length(temp)
    if (temp(i) == D(i-1-rep_count))
        rep_count = rep_count+1;
    else
        D(i-rep_count) =  temp(i);
    end
end

% Remove zeros at the end of D
D = D(1:length(D)-rep_count);

disp(D)

2 Comments

Thanks. Is there any other way than using matlab built-in 'unique'?
@user5916581 I edited my solution above with an alternative for you. It is probably slower than unique...
1

It is possible to sort the data and check the the unique values. This seems to be about as efficient as using the function unique(). Possibly with an advantage for using sort() and diff(). This may however be dependent on hardware and the difference is fairly insignificant, taking into account the simplicity of D = unique([A;B;C]);.

function test()

% A=[1;2;5;9;15];
% B=[2;3;5;11;15];
% C=[5;7;11;20;25];

A = 500*rand(10000000,1);
B= 500*rand(10000000,1);
C = 500*rand(10000000,1);

f1 = @() testA(A,B,C);
f2 = @() testB(A,B,C);

time1 = timeit(f1,1);
time2 = timeit(f2,1);
disp(time1);
disp(time2);

function D = testA(A,B,C)
d = sort([A;B;C]);
idx = diff(d);
D = d([1;idx]>0);

function D = testB(A,B,C)
D = unique([A;B;C]);

test

1.9085

1.9968

1 Comment

I have tested this on my computer in compare to the use of histcounts (testC), the results: testA = 1.6110, testB = 1.5125, testC = 0.1835

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.