This is the data frame I am using and I am trying to subsample column V2(position) evenly (min:1130, max: 4406748) in a way that there is only one representative of column V4(lineage) in the final sample. I am trying to sample in a way that positions are evenly distributed while ensuring that I include only 1 representative of each group in the entire sample.
I have tried sorting and binning data but I cannot figure out how to evenly sample from it in a way that only 1 representative lineage is present in the data frame.
sorted_barcodes <- tb_profiler_barcodes %>% arrange(V2)
# bin the data to N bins
binned_sorted <- sorted_df %>%
mutate(bin = cut(V2, breaks = 150, labels = FALSE))
I would appreciate your help.
