0

I want to sample a different column for each row of a dataframe using differing weights. I have tried a few things but have not been successful, including looking up similar questions. I am presenting a mock DF and expected output below.

library(plyr)
set.seed(12345)
df1 <- mdply(data.frame(mean=c(10, 15, 12, 24)), rnorm, n = 5, sd = 1)
df1

I want a vectorized solution (hopefully) to sample one column from V1 to V5 for every row. The weights for the sampling are the values in each cell from V1 to V5 for the row in question. The actual dataframe could have a couple million rows. A sample output is shown below.

f_col <- c(10,15,12,24)
sampled_column <- c("V3", "V1", "V5", "V5")

output_df1 <- data.frame("mean" = f_col, "result" = sampled_column)
output_df1
7
  • Just use sample(names(df1)[2:6], nrow(df1), replace = TRUE) If it needs to be different, use replace = FALSE Commented May 21, 2019 at 4:32
  • Thanks Akrun. Can you please clarify where is the weights vector being declared and how do I declare the number of samples. Commented May 21, 2019 at 4:43
  • 1
    You say that you want one value per row, but next you talk about number of samples. Could you please clarify? You want for the fourth row 4 samples for instance? Does the row need to be replicated for each sample? And about the weights, you mean that for the first row V2 should be more likely to be extracted than V4? Commented May 21, 2019 at 4:50
  • One value per row refers to one column for each row. Yes, the fourth row should have four samples. I can just repeat the row as you suggest. Weights refer to the values from column to V1 to V5 for each row. I have edited my question to address the confusion and make it clearer. Thanks. Commented May 21, 2019 at 5:11
  • @MD_1977 -shouldn't "sample one column from V1 to V5 for every row." be removed from the question then? Commented May 21, 2019 at 5:14

1 Answer 1

1

In sample you can use prob to weight your sample probability. To make this for every row you can use apply.

output_df1 <- data.frame("mean"=df1$mean, "result"=apply(df1[,-1], 1, function(x) {sample(names(x), 1, prob=x)}))
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you. This works. I will test it for run times, but this is great.
Hope it is also performant enough. For 10000 rows it needs on my pc ~ 0.1 second.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.