0

My simplified Dataset looks like:

class MyDataset(Dataset):
    def __init__(self) -> None:
        super().__init__()
        self.images: torch.Tensor[n, w, h, c]   # n images in memmory - specific use case
        self.labels: torch.Tensor[n, w, h, c]   # n images in memmory - specific use case
        self.positive_idx: List                 # positive 1 out of 10000 negative
        self.negative_idx: List
        
    def __len__(self):
        return 10000 # fixed value for training
        
    def __getitem__(self, idx):
        return self.images[idx], self.labels[idx]
    

ds = MyDataset()
dl = DataLoader(ds, batch_size=100, shuffle=False, sampler=...)   
# Weighted Sampler? Shuffle False because I guess the sampler should process shuffling.

What is the most "torch" way of balancing the sampling for Dataloader so the batch will be constructed as 10 positive + 90 random negative in each epoch and in case of not enough positive duplicating the possible ones?

For the purpose of this exercise I'm not implementing augmenting for increasing sample size of positives.

3
  • Can I down-sampling negative samples? If it's about triplet loss or contrastive loss, I would use hard negative mining Commented Mar 1, 2024 at 9:52
  • Hard negative mining would be kind of "workaround" but I'm sure it's easier to do it before training in specifying samples from which pool (positive/negative) are selected. Commented Mar 2, 2024 at 11:11
  • and down-sampling is of course a way to do. Just wanted to do with torch Sampler instead of iterating in smart way on indexes of positive and negative list. Commented Mar 2, 2024 at 14:00

1 Answer 1

1

I think you can implement a Batch Sampler to choose which data point will be yield for your dataset via __getitem__

class NegativeSampler:

  def __init__(self, positive_idx, negative_idx):
     
    self.positive_idx = positive_idx
    self.negative_idx = negative_idx 

  def __iter__(self): # this function will return index for your custom dataset ```__getitem__(self, idx)```
    
    for i in range(n_batch):
      positive_idx_batch = random.sample(self.positive_idx, batch_size)
      negative_idx_batch = []

      for pos_idx in positive_idx_batch:
        negative_idx_batch.append()
    
    
      yield positive_idx_batch + negative_idx_batch  

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.