R: How to sample a different column for each row of a dataframe?

Question

I want to sample a different column for each row of a dataframe using differing weights. I have tried a few things but have not been successful, including looking up similar questions. I am presenting a mock DF and expected output below.

library(plyr)
set.seed(12345)
df1 <- mdply(data.frame(mean=c(10, 15, 12, 24)), rnorm, n = 5, sd = 1)
df1

I want a vectorized solution (hopefully) to sample one column from V1 to V5 for every row. The weights for the sampling are the values in each cell from V1 to V5 for the row in question. The actual dataframe could have a couple million rows. A sample output is shown below.

f_col <- c(10,15,12,24)
sampled_column <- c("V3", "V1", "V5", "V5")

output_df1 <- data.frame("mean" = f_col, "result" = sampled_column)
output_df1

Just use sample(names(df1)[2:6], nrow(df1), replace = TRUE) If it needs to be different, use replace = FALSE — akrun
– akrun, Commented May 21, 2019 at 4:32
Thanks Akrun. Can you please clarify where is the weights vector being declared and how do I declare the number of samples. — MD_1977
– MD_1977, Commented May 21, 2019 at 4:43
You say that you want one value per row, but next you talk about number of samples. Could you please clarify? You want for the fourth row 4 samples for instance? Does the row need to be replicated for each sample? And about the weights, you mean that for the first row V2 should be more likely to be extracted than V4? — nicola
– nicola, Commented May 21, 2019 at 4:50
One value per row refers to one column for each row. Yes, the fourth row should have four samples. I can just repeat the row as you suggest. Weights refer to the values from column to V1 to V5 for each row. I have edited my question to address the confusion and make it clearer. Thanks. — MD_1977
– MD_1977, Commented May 21, 2019 at 5:11
@MD_1977 -shouldn't "sample one column from V1 to V5 for every row." be removed from the question then? — thelatemail
– thelatemail, Commented May 21, 2019 at 5:14

GKi · Accepted Answer · 2019-05-21 09:02:53Z

1

In sample you can use prob to weight your sample probability. To make this for every row you can use apply.

output_df1 <- data.frame("mean"=df1$mean, "result"=apply(df1[,-1], 1, function(x) {sample(names(x), 1, prob=x)}))

answered May 21, 2019 at 9:02

GKi

40.1k3 gold badges36 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

MD_1977 Over a year ago

Thank you. This works. I will test it for run times, but this is great.

GKi Over a year ago

Hope it is also performant enough. For 10000 rows it needs on my pc ~ 0.1 second.

Collectives™ on Stack Overflow

R: How to sample a different column for each row of a dataframe?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related