matlab array create random consecutive ones

Question

I would like to create an 4-dimensional array with a random number of consecutive ones in each row. The ones should always start in the first column and end in a random column. Example:

array(:,:,1,1) = [ 1 1 1 0 0 0;
                   1 1 0 0 0 0;
                   1 1 1 1 1 0;
                   ...         ]

One could do that with 3 for loops but it is inefficient:

array = zeros(n,n,n,n);
for i= 1:n
   for j = 1:n
      for k =1:n
         rows = ceil(n*rand());
         array(k,1:rows,j,i) = 1;
      end
   end
end

Can somebody find a better solution? Thanks!!

What's your target data size? Thousands/millions/billions of elements? — Andrew Janke
– Andrew Janke, Commented Mar 21, 2014 at 15:29

Divakar · Accepted Answer · 2014-03-22 10:48:30Z

5

Simple and straight-up approach (might not be truly random though)

rows = 8; %%// Number of rows
cols = 7; %%// Number of columns
ch3 = 3; %%// Number of elements in the 3rd dimension
ch4 = 2; %%// Number of elements in the 4th dimension

array = sort(round(rand(rows,cols,ch3,ch4)),2,'descend')

Bsxfun approach (Much faster as shown in benchmark results below and truly random)

%%// Sizes
rows = 8; %%// Number of rows
cols = 7; %%// Number of columns
ch3 = 3; %%// Number of elements in the 3rd dimension
ch4 = 2; %%// Number of elements in the 4th dimension

%%// Get a 2D array with every row starting with a one
rows2 = rows*ch3*ch4;
a1 = reshape(1:rows2*cols,rows2,[]);
col = randi(cols,[1 rows2]);
b1 = bsxfun(@plus,(col-1)*rows2,1:rows2)';%//'
out = bsxfun(@le,a1,b1);

%%// Rearrange those to 4D
a2 = reshape(out',[rows*cols ch3 ch4]);%//'
a3 = reshape(a2,cols,rows,ch3,ch4);
array = permute(a3,[2 1 3 4]);

Benchmark results

We are comparing the above mentioned two approaches along with Rody's approach.

Datasize I:
rows = 80;  %// Number of rows
cols = 70;  %// Number of columns
ch3  = 30;  %// Number of elements in the 3rd dimension
ch4  = 2; %// Number of elements in the 4th dimension

Results:
Elapsed time with SORT approach is:   0.0083445sec
Elapsed time with BSXFUN approach is: 0.0021sec
Elapsed time with RODY approach is:   0.0063026sec

Datasize II:
rows = 80;  %// Number of rows
cols = 70;  %// Number of columns
ch3  = 30;  %// Number of elements in the 3rd dimension
ch4  = 20; %// Number of elements in the 4th dimension

Results:
Elapsed time with SORT approach is:   0.07875sec
Elapsed time with BSXFUN approach is: 0.012329sec
Elapsed time with RODY approach is:   0.055937sec


Datasize III:
rows = 800;  %// Number of rows
cols = 70;  %// Number of columns
ch3  = 30;  %// Number of elements in the 3rd dimension
ch4  = 20; %// Number of elements in the 4th dimension

Results:
Elapsed time with SORT approach is:   0.87257sec
Elapsed time with BSXFUN approach is: 0.17624sec
Elapsed time with RODY approach is:   0.57786sec


Datasize IV:
rows = 800;  %// Number of rows
cols = 140;  %// Number of columns
ch3  = 30;  %// Number of elements in the 3rd dimension
ch4  = 20; %// Number of elements in the 4th dimension

Results:
Elapsed time with SORT approach is:   1.8508sec
Elapsed time with BSXFUN approach is: 0.35349sec
Elapsed time with RODY approach is:   0.71918sec

In conclusion from these findings, bsxfun looks like the way to go, unless you want to process billions of elements, as memory-benchmark results produced by Rody's solution suggests.

edited Mar 22, 2014 at 10:48

answered Mar 21, 2014 at 11:11

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

13 Comments

Rody Oldenhuis Over a year ago

(you already had my +1, so...+2!) Anyway, memory consumption was my primary concern, keeping performance acceptable. Compare mine and your bsxfun approaches with profile -memory, and look at the peak memory column; that's where the largest difference will be. Your first bsxfun creates a potentially huge double array, which I avoid. You'll indeed be faster, but only if you have the RAM.

RTL Over a year ago

A warning about all these solutions, they randomise each element not the number in each row, so the number of ones in each row will follow a normal distribution, not uniform distribution (as per the version presented by the OP) i used hist(builtin('_paren',sum(array,2),:)) to quickly test

RTL Over a year ago

too late to edit but it is only the sort method which has this issue

Divakar Over a year ago

@RodyOldenhuis I just increased the ch3 from 30 to 45 in Datasize IV and the bsxfun approach ran fine and your method ran out of memory. In any case, is the OP really looking for 200 million elements? Appreciate the +1 up!:)

Rody Oldenhuis Over a year ago

@Divakar: You'd be surprised how often that occurs nowadays; big data is a big thing. Granted, it can often be prevented, but certainly not always. I for one find myself thinking a lot more in terms of how to handle the giant version of something efficiently, rather than just getting the absolute best performance for the small cases. And by giant, I mean efficiently storing a sparse matrix of several billion elements in each direction, and about solving linear systems with such beasts. Distributed storage and computing, great fun :)

|

Rody Oldenhuis · Accepted Answer · 2014-03-21 17:12:00Z

A "big-data" alternative:

rows = 80;  %// Number of rows
cols = 70;  %// Number of columns
ch3  = 30;  %// Number of elements in the 3rd dimension
ch4  = 200; %// Number of elements in the 4th dimension

%// number of 1's in each row. Select the appropriate class to make 
%// sure peak memory remains acceptably small
intClasses = {'uint8' 'uint16' 'uint32' 'uint64'}; 
maxSizes   = cellfun(@(x) double(intmax(x)), intClasses);
numOnes    = randi(cols, rows*ch3*ch4,1, ...
                   intClasses{find(cols <= maxSizes, 1,'first')});

clear intClasses maxSizes   

%// Loop through all rows and flip the appropriate number of bits
array = false(rows*ch3*ch4, cols);
for ii  = 1:numel(numOnes)
    array(ii,1:numOnes(ii)) = true; end

clear ii numOnes    

%// Reshape into desired dimensions
array = reshape(array.', cols,rows,ch3,ch4);
array = permute(array, [2 1 3 4]);

clear rows cols ch3 ch4

(You could also use a sparse, but MATLAB only supports sparse for data of type double...In all probability, the logical array consumes less memory. If you're really desperate: the numOnes array alone contains sufficient information to retrieve either a 1 or 0; there's no need to construct the actual array. So, you could write a small wrapper class that implements the subsref in this way, basically, creating your own specific sparse that way :)

Note that this is a lot slower than Divakar's solution when the array is small. However, then the array sizes increase, this becomes progressively faster than Divakar's solution, with a lot smaller peak memory footprint.

EDIT: Divakar's bsxfun approach is the fastest of them all, however, it has O(N²) memory complexity. I.e., it only works if you can fit

`(cols-1)·rows²·ch3²·ch4² + (cols+1)·rows·ch3·ch4

(plus a few) 8-byte double values in your RAM, as opposed to just cols·rows·ch3·ch4 1-byte booleans for my method ^_^

As a simple comparison: re-running the same test as Divakar above, but within profile -memory:

Divakar's method:

Divakar's method

My method:

Rody' method

As you can see, my method's peak memory equals the size of the output array (~130MB), while the overall speed the doesn't really suffer all that much, whereas Divakar's method has a memory footprint of 10× the output size (~1.4GB).

Good inputs with the memory-benchmarking. I am guessing it's all because of the bsxfun with it's JIT. If so, it's good I guess, because it indicates that it's using more of memory resources to compute something quicker than otherwise, until hit by memory limits. Maybe in future bsxfun would automatically re-arrange it's inputs to use some internal looping to account for system resources available, because it's doesn't look it is? Would be nice!

Collectives™ on Stack Overflow

matlab array create random consecutive ones

2 Answers 2

13 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

13 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related