A technique used to fit a large model in multiple GPUs.
Each GPU processes a slice of a tensor and only aggregates the full tensor for operations requiring it.
A technique used to fit a large model in multiple GPUs.
Each GPU processes a slice of a tensor and only aggregates the full tensor for operations requiring it.