4

I would like to create an 4-dimensional array with a random number of consecutive ones in each row. The ones should always start in the first column and end in a random column. Example:

array(:,:,1,1) = [ 1 1 1 0 0 0;
                   1 1 0 0 0 0;
                   1 1 1 1 1 0;
                   ...         ]

One could do that with 3 for loops but it is inefficient:

array = zeros(n,n,n,n);
for i= 1:n
   for j = 1:n
      for k =1:n
         rows = ceil(n*rand());
         array(k,1:rows,j,i) = 1;
      end
   end
end

Can somebody find a better solution? Thanks!!

1
  • What's your target data size? Thousands/millions/billions of elements? Commented Mar 21, 2014 at 15:29

2 Answers 2

5

Simple and straight-up approach (might not be truly random though)

rows = 8; %%// Number of rows
cols = 7; %%// Number of columns
ch3 = 3; %%// Number of elements in the 3rd dimension
ch4 = 2; %%// Number of elements in the 4th dimension

array = sort(round(rand(rows,cols,ch3,ch4)),2,'descend')

Bsxfun approach (Much faster as shown in benchmark results below and truly random)

%%// Sizes
rows = 8; %%// Number of rows
cols = 7; %%// Number of columns
ch3 = 3; %%// Number of elements in the 3rd dimension
ch4 = 2; %%// Number of elements in the 4th dimension

%%// Get a 2D array with every row starting with a one
rows2 = rows*ch3*ch4;
a1 = reshape(1:rows2*cols,rows2,[]);
col = randi(cols,[1 rows2]);
b1 = bsxfun(@plus,(col-1)*rows2,1:rows2)';%//'
out = bsxfun(@le,a1,b1);

%%// Rearrange those to 4D
a2 = reshape(out',[rows*cols ch3 ch4]);%//'
a3 = reshape(a2,cols,rows,ch3,ch4);
array = permute(a3,[2 1 3 4]);

Benchmark results

We are comparing the above mentioned two approaches along with Rody's approach.

Datasize I:
rows = 80;  %// Number of rows
cols = 70;  %// Number of columns
ch3  = 30;  %// Number of elements in the 3rd dimension
ch4  = 2; %// Number of elements in the 4th dimension

Results:
Elapsed time with SORT approach is:   0.0083445sec
Elapsed time with BSXFUN approach is: 0.0021sec
Elapsed time with RODY approach is:   0.0063026sec

Datasize II:
rows = 80;  %// Number of rows
cols = 70;  %// Number of columns
ch3  = 30;  %// Number of elements in the 3rd dimension
ch4  = 20; %// Number of elements in the 4th dimension

Results:
Elapsed time with SORT approach is:   0.07875sec
Elapsed time with BSXFUN approach is: 0.012329sec
Elapsed time with RODY approach is:   0.055937sec


Datasize III:
rows = 800;  %// Number of rows
cols = 70;  %// Number of columns
ch3  = 30;  %// Number of elements in the 3rd dimension
ch4  = 20; %// Number of elements in the 4th dimension

Results:
Elapsed time with SORT approach is:   0.87257sec
Elapsed time with BSXFUN approach is: 0.17624sec
Elapsed time with RODY approach is:   0.57786sec


Datasize IV:
rows = 800;  %// Number of rows
cols = 140;  %// Number of columns
ch3  = 30;  %// Number of elements in the 3rd dimension
ch4  = 20; %// Number of elements in the 4th dimension

Results:
Elapsed time with SORT approach is:   1.8508sec
Elapsed time with BSXFUN approach is: 0.35349sec
Elapsed time with RODY approach is:   0.71918sec

In conclusion from these findings, bsxfun looks like the way to go, unless you want to process billions of elements, as memory-benchmark results produced by Rody's solution suggests.

Sign up to request clarification or add additional context in comments.

13 Comments

(you already had my +1, so...+2!) Anyway, memory consumption was my primary concern, keeping performance acceptable. Compare mine and your bsxfun approaches with profile -memory, and look at the peak memory column; that's where the largest difference will be. Your first bsxfun creates a potentially huge double array, which I avoid. You'll indeed be faster, but only if you have the RAM.
A warning about all these solutions, they randomise each element not the number in each row, so the number of ones in each row will follow a normal distribution, not uniform distribution (as per the version presented by the OP) i used hist(builtin('_paren',sum(array,2),:)) to quickly test
too late to edit but it is only the sort method which has this issue
@RodyOldenhuis I just increased the ch3 from 30 to 45 in Datasize IV and the bsxfun approach ran fine and your method ran out of memory. In any case, is the OP really looking for 200 million elements? Appreciate the +1 up!:)
@Divakar: You'd be surprised how often that occurs nowadays; big data is a big thing. Granted, it can often be prevented, but certainly not always. I for one find myself thinking a lot more in terms of how to handle the giant version of something efficiently, rather than just getting the absolute best performance for the small cases. And by giant, I mean efficiently storing a sparse matrix of several billion elements in each direction, and about solving linear systems with such beasts. Distributed storage and computing, great fun :)
|
2

A "big-data" alternative:

rows = 80;  %// Number of rows
cols = 70;  %// Number of columns
ch3  = 30;  %// Number of elements in the 3rd dimension
ch4  = 200; %// Number of elements in the 4th dimension

%// number of 1's in each row. Select the appropriate class to make 
%// sure peak memory remains acceptably small
intClasses = {'uint8' 'uint16' 'uint32' 'uint64'}; 
maxSizes   = cellfun(@(x) double(intmax(x)), intClasses);
numOnes    = randi(cols, rows*ch3*ch4,1, ...
                   intClasses{find(cols <= maxSizes, 1,'first')});

clear intClasses maxSizes   

%// Loop through all rows and flip the appropriate number of bits
array = false(rows*ch3*ch4, cols);
for ii  = 1:numel(numOnes)
    array(ii,1:numOnes(ii)) = true; end

clear ii numOnes    

%// Reshape into desired dimensions
array = reshape(array.', cols,rows,ch3,ch4);
array = permute(array, [2 1 3 4]);

clear rows cols ch3 ch4

(You could also use a sparse, but MATLAB only supports sparse for data of type double...In all probability, the logical array consumes less memory. If you're really desperate: the numOnes array alone contains sufficient information to retrieve either a 1 or 0; there's no need to construct the actual array. So, you could write a small wrapper class that implements the subsref in this way, basically, creating your own specific sparse that way :)

Note that this is a lot slower than Divakar's solution when the array is small. However, then the array sizes increase, this becomes progressively faster than Divakar's solution, with a lot smaller peak memory footprint.

EDIT: Divakar's bsxfun approach is the fastest of them all, however, it has O(N²) memory complexity. I.e., it only works if you can fit

`(cols-1)·rows²·ch3²·ch4² + (cols+1)·rows·ch3·ch4

(plus a few) 8-byte double values in your RAM, as opposed to just cols·rows·ch3·ch4 1-byte booleans for my method ^_^

As a simple comparison: re-running the same test as Divakar above, but within profile -memory:

Divakar's method:

Divakar's method

My method:

Rody' method

As you can see, my method's peak memory equals the size of the output array (~130MB), while the overall speed the doesn't really suffer all that much, whereas Divakar's method has a memory footprint of 10× the output size (~1.4GB).

1 Comment

Good inputs with the memory-benchmarking. I am guessing it's all because of the bsxfun with it's JIT. If so, it's good I guess, because it indicates that it's using more of memory resources to compute something quicker than otherwise, until hit by memory limits. Maybe in future bsxfun would automatically re-arrange it's inputs to use some internal looping to account for system resources available, because it's doesn't look it is? Would be nice!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.