My simplified Dataset looks like:
class MyDataset(Dataset):
def __init__(self) -> None:
super().__init__()
self.images: torch.Tensor[n, w, h, c] # n images in memmory - specific use case
self.labels: torch.Tensor[n, w, h, c] # n images in memmory - specific use case
self.positive_idx: List # positive 1 out of 10000 negative
self.negative_idx: List
def __len__(self):
return 10000 # fixed value for training
def __getitem__(self, idx):
return self.images[idx], self.labels[idx]
ds = MyDataset()
dl = DataLoader(ds, batch_size=100, shuffle=False, sampler=...)
# Weighted Sampler? Shuffle False because I guess the sampler should process shuffling.
What is the most "torch" way of balancing the sampling for Dataloader so the batch will be constructed as 10 positive + 90 random negative in each epoch and in case of not enough positive duplicating the possible ones?
For the purpose of this exercise I'm not implementing augmenting for increasing sample size of positives.