etha.pg_utils#

Process group utilities with caching.

Functions#

get_or_create_process_group(...)

Get or create a process group for the given ranks.

Module Contents#

etha.pg_utils.get_or_create_process_group(ranks: list[int]) torch.distributed.ProcessGroup#

Get or create a process group for the given ranks.

Uses caching to avoid repeated dist.new_group() calls. Process groups must be created in same order on all ranks (PyTorch requirement). The cache key uses sorted ranks to ensure consistency.