5

How can I take a sample of n random points from a matrix populated with 1's and 0's ?

a=rep(0:1,5)
b=rep(0,10)
c=rep(1,10)
dataset=matrix(cbind(a,b,c),nrow=10,ncol=3)

dataset
      [,1] [,2] [,3]
 [1,]    0    0    1
 [2,]    1    0    1
 [3,]    0    0    1
 [4,]    1    0    1
 [5,]    0    0    1
 [6,]    1    0    1
 [7,]    0    0    1
 [8,]    1    0    1
 [9,]    0    0    1
[10,]    1    0    1

I want to be sure that the positions(row,col) from were I take the N samples are random.

I know sample {base} but it doesn't seem to allow me to do that, other methods I know are spatial methods that will force me to add x,y and change it to a spatial object and again back to a normal matrix.

More information

By random I mean also spread inside the "matrix space", e.g. if I make a sampling of 4 points I don't want to have as a result 4 neighboring points, I want them spread in the "matrix space".

Knowing the position(row,col) in the matrix where I took out the random points would also be important.

4
  • Why doesn't sample seem to do what you want? Commented Feb 2, 2012 at 9:45
  • I don't see any option for "random". maybe this is implicit in the function sample{base}. what i want to be sure is that the points selected are spread, not cluster inside the matrix. if make a sample of 10 points, the 10 points should be random in the matrix space. Commented Feb 2, 2012 at 9:50
  • I agree that sample is not really clear on being random, although it is. If you want spread, than random sampling is not a garantuee. Commented Feb 2, 2012 at 9:53
  • I added a more philosophical discussion on sampling to my answer. Commented Feb 2, 2012 at 10:07

2 Answers 2

12

There is a very easy way to sample a matrix that works if you understand that R represents a matrix internally as a vector.

This means you can use sample directly on your matrix. For example, let's assume you want to sample 10 points with replacement:

n <- 10
replace=TRUE

Now just use sample on your matrix:

set.seed(1)
sample(dataset, n, replace=replace)
 [1] 1 0 0 1 0 1 1 0 0 1

To demonstrate how this works, let's decompose it into two steps. Step 1 is to generate an index of sampling positions, and step 2 is to find those positions in your matrix:

set.seed(1)
mysample <- sample(length(dataset), n, replace=replace)
mysample
 [1]  8 12 18 28  7 27 29 20 19  2

dataset[mysample]
 [1] 1 0 0 1 0 1 1 0 0 1

And, hey presto, the results of the two methods are identical.

Sign up to request clarification or add additional context in comments.

Comments

5

Sample seems the best bet for you. To get 1000 random positions you can do something like:

rows = sample(1:nrow(dataset), 1000, replace = TRUE)
columns = sample(1:ncol(dataset), 1000, replace = TRUE)

I think this gives what you want, but ofcourse I could be mistaken.

Extracting the items from the matrix can be done like:

random_sample = mapply(function(row, col) 
                           return(dataset[row,col]), 
                    row = rows, col = columns)

Sampling strategies

In the comments you speak that your sample needs to have spread. A random sample has no garantuees that there will be no clusters, because of its random nature. There are several more sampling schemes that might be interesting to explore:

  • Regular sampling, skip the randomness and just sample regularly. Samples the entire matrix space evenly, but there is no randomness.
  • Stratified random sampling, you divide your matrix space into regular subset, and then sample randomly in those subsets. Presents a mix between random and regular.

To check if your random sampling produces good results, I'd repeat the random sampling a few times and compare the results (as I assume that the sampling will be input for another analysis?).

5 Comments

This is correct, but then you still have a challenge to extract the elements from your matrix. If you were to do dataset[rows, columns] it will result in a 1000*1000 matrix, not a vector of 1000 elements. I gave up on this approach after two minutes, but I'd be interested to see how you solve it.
+1 Nice use of mapply (although I think using sample directly on the matrix is much simpler).
Yes, this will be used for other analysis. I will check multiple samples to be sure how he is sampling the matrix, in the end I will probably try something like the stratified random sampling, seems more appropriate.
A more theoretical question on sampling strategies could fit well at stats.stackexchange.com
@Andrie, I got stuck in the mindflow of getting random rows and numbers :).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.