0

I want to create a sub-sample of data frame df, depending on the frequency of a given category in one of its columns, e.g. a.

Let's assume we have a data frame like this:

df <- data.frame(a = rep(1:4, c(3, 9, 4, 8)),
                 b = runif(24)) 

then I want to get a sub-sample of rows, proportional to the categories in column a, first in a random way:

smpl <- unlist(lapply(1:4, \(x) sample(c(TRUE, FALSE), 
                                       size = sum(x==df$a), 
                                       replace = TRUE)))
df[smpl,]

Here sample leads to the intended effect, that half of the records are returned on average for each category. However, it may be more or less (and even zero) for a category in a specific case.

I am also looking for second "more deterministic" approach, where only the cases are selected at random, but returns for each category either 50% of cases in the even case or N %/% 2 resp. N %/% 2 +1 records in the uneven case. The code should be easily readable.

2
  • 2
    can you explain what you mean by "50% +/-1 of the corresponding rows"?.. Also, are you not satisfied with the approach you already have for the first approach? Commented Mar 2, 2023 at 0:50
  • With 50% +/-1, I meant either integer division (%/%) or integer division +1. The question was edited to improve clarity. The code is for a teaching project where I am seeking for elegant and clear solutions, understandable by beginners. A tidyverse version would also be welcome. Commented Mar 2, 2023 at 6:16

1 Answer 1

0

In the meantime, I found a possible solution myself. First I searched for "stratified" instead of "weighted" and changed the question title accordingly. Then, function slice_sample was found in package dplyr. It can be run with two optional arguments n and prop, so we can do:

Case 1:

df |> slice_sample(n = nrow(df) %/% 2, weight_by = a)

Case 2:

df |> slice_sample(prop=0.5, weight_by = a)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.