Say I have a population of mixed ages and genders (and maybe other attributes), and I want to generate a random subsample (with replacement is ok) with certain attributes, e.g.:
- Sample size N
- 50% of the sample should be age<30
- 20% of the sample should be male
I could first randomly pick N/2 people with age<30 and age>=30, but this would likely not have the correct gender mix. I could sub-select and ensure that of the age<30 people, 20% are male, but this is too highly specified - I want the overall distributions to match but not specify anything about the product of age and gender.
How do I generate this sample? What if I made it slightly more complicated and specified ranges:
- Sample size N
- 50-80% under age 30 (uniform probability in that range)
- 20-30% male (uniform probability in that range)
I imagine it might be possible to iteratively generate such a sample, alternately pruning it to match the each requirement until convergence, but I'm not sure how to do it properly. The dumbest way of course would be to just generate random samples and reject them if they don't match these requirements.
reweightpackage sounds like it might be helpful: cran.r-project.org/web/packages/reweight/reweight.pdf