etha.pg_utils#
Process group utilities with caching.
Functions#
Get or create a process group for the given ranks. |
Module Contents#
- etha.pg_utils.get_or_create_process_group(ranks: list[int]) torch.distributed.ProcessGroup#
Get or create a process group for the given ranks.
Uses caching to avoid repeated dist.new_group() calls. Process groups must be created in same order on all ranks (PyTorch requirement). The cache key uses sorted ranks to ensure consistency.