2

With the help of people on this site I have a matrix y that looks similar to this (but much more simplified).

1,3
1,3
1,3
7,1
8,2
8,2

I have created a third column that generates random numbers (without replacement for each of the repeating chunks using this code j=cbind(y,sample(1:99999,y[,2],replace=FALSE)).

Matrix j looks like this:

1,3,4520
1,3,7980
1,3,950
7,1,2
8,3,4520
8,3,7980
8,3,950

How do I obtain truly random numbers for my third column such that for each of the repeating rows i.e. 3,then 1, then 2 I get a random number that is not replicated within that repeating part (replace = FALSE)?

1
  • Sorry, you seem to have left a comment and deleted it. Could you write it again? I understand that the answer I've doesn't seem to be what you want. Commented Mar 1, 2013 at 18:54

3 Answers 3

5

Why this happens:

The problem is that sample command structure is:

sample(vector of values, how many?, replace = FALSE or TRUE)

here, "how many?" is supposed to be ONE value. Since you provide the whole of the second column of y, it just picks the first value which is 3 and so it reads as:

set.seed(45) # just for reproducibility
sample(1:99999, 3, replace = F)

And for this seed, the values are:

# [1] 63337 31754 24092

And since there are only 3 values are you're binding it to your matrix with 6 rows, it "recycles" the values (meaning, it repeats the values in the same order). So, you get:

#      [,1] [,2]  [,3]
# [1,]    1    3 63337
# [2,]    1    3 31754
# [3,]    1    3 24092
# [4,]    7    1 63337
# [5,]    8    2 31754
# [6,]    8    2 24092

See that the values repeat. For the matrix you've shown, I've no idea how the 7,1,2 occurs. As the first value of your matrix in y[,2] = 3.

What you should do instead:

y <- cbind(y, sample(1:99999, nrow(y), replace = FALSE))

This asks sample to generate nrow(y) = 6 (here) values without replacement. This would generate non-identical values of length 6 and that'll be binded to your matrix y.

Sign up to request clarification or add additional context in comments.

Comments

1

This should get you what you want:

j <- cbind(y, unlist(sapply(unique(y[,2]), function(n) sample(1:99999, n))))

edit: There was an error in code. Function unique is of course needed.

7 Comments

I'm sorry but I don't understand this. I get 6 unique values here as well. How is this different from sample(1:99999, 6) exactly?
With the example data provided, unique(y[,2]) is c(3,1,2). Now with sapply you first sample 3 values from 1:99999, then 1 value, and finally 2 values. There can be same values in these groups, for example if you sample from 1:10 and use set.seed(1), you get c(3, 4, 5, 10, 3, 9). I agree that the code is bit cryptic, hopefully this cleared the issue.
It seems a bit far fetched to try to get repeating values within groups with a range of 1:99999 and picking a few. But I get your point.
True, but that was part of the question, and it was said that the example data is much simplified from the actual case.
I also failed to read it that way first, but the comment OP deleted below your answer was about this issue. No idea why it was deleted.
|
1

I can't get this without a loop. Maybe someone else can get more elegant solution. For me the problem is to sample with repetition intra-group and without repetition inter-group

ll <- split(dat, paste(dat$V1,dat$V2,sep=''))
ll.length <- by(dat, paste(dat$V1,dat$V2,sep=''),nrow)
z <- rep(0,nrow(dat))  

SET <- seq(1,100)  ## we can change 100 by 99999 for example
v =1
for (i in seq_along(ll)){
  SET <- SET[is.na(match(z,SET))]
  nn   <- nrow(ll[[i]]) 
  z[v:(v+nn-1)] <- sample(SET,nn,rep=TRUE) 
  v <- v+nn
}

 z
[1]  35  77  94 100  23  59

1 Comment

this seems to the opposite of what Hemmo has given. His seems to be intra-group without repetition and inter-group with (possible) repetition.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.