Same batch in distributed backends #149

OleguerCanal · 2022-03-14T14:40:03Z

Hey guys! I might be mistaken but I think the way the samplers are implemented, if using a distributed backend (such as ddp, ddp-sharded), samples the same examples for all the accelerators (gpus).

Instead of inheriting from a torch.utils.data.Sampler I suggest to inherit from a torch.utils.data.distributed.DistributedSampler and partition the data across accelerators doing something like this:

class RandomSampler(DistributedSampler):
    r"""
    Implementation of a Random Sampler for sampling the dataset.

    Args:
        data_source (torch.utils.data.Dataset): dataset to sample from
        batch_size (int): size of batch
        drop_last (bool): flat indication whether to drop last batch or not
    """
    def __init__(self, data_source, batch_size: int = 32, drop_last: bool = True) -> None:
        super(RandomSampler, self).__init__(data_source, drop_last=drop_last)
        self.data_source = data_source
        self.batch_size = batch_size
        ids = list(range(0, len(data_source)))
        start = int(len(data_source)*self.rank/self.num_replicas)
        end = int(len(data_source)*(self.rank + 1)/self.num_replicas)
        self.bins = [ids[i:i + batch_size] for i in range(start, end, batch_size)]
        self.drop_last = drop_last

    def __iter__(self):
        for ids in self.bins:
            yield ids

    def __len__(self):
        return len(self.bins)

The text was updated successfully, but these errors were encountered:

upskyy · 2022-03-16T02:44:26Z

Thanks for suggesting great ideas. Can you open the PR with the test code?

OleguerCanal · 2022-03-17T11:39:51Z

Sure! Will do when I have a bit of time 🧐

upskyy self-assigned this Mar 16, 2022

upskyy added the ENHANCEMENT New feature or request label Mar 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Same batch in distributed backends #149

Same batch in distributed backends #149

OleguerCanal commented Mar 14, 2022

upskyy commented Mar 16, 2022

OleguerCanal commented Mar 17, 2022

Same batch in distributed backends #149

Same batch in distributed backends #149

Comments

OleguerCanal commented Mar 14, 2022

upskyy commented Mar 16, 2022

OleguerCanal commented Mar 17, 2022