etha.comm.ir#
Intermediate Representation for tensor transfer operations.
Attributes#
Classes#
Bucket for transfer operations (byte-based buffer). |
|
Bucket offset entry (byte-based). |
|
Unified chunk for all transfer operations. |
Module Contents#
- class etha.comm.ir.Bucket#
Bases:
etha.comm.transfer.TransferableBucket for transfer operations (byte-based buffer).
- launch() bool#
Launch communication operation.
- Returns:
True if launched, False if still waiting for buffer to be ready.
- prepare() None#
Prepare buffer for communication.
Source-side Partial reduce and dtype cast both live inside
Chunk.prepare; this method only assembles entries into the bucket buffer.
- buffer_ready_event: torch.cuda.Event | None = None#
- device: torch.device | None = None#
- entries: list[BucketEntry]#
- class etha.comm.ir.BucketEntry#
Bucket offset entry (byte-based).
- class etha.comm.ir.Chunk#
Bases:
etha.comm.transfer.TransferableUnified chunk for all transfer operations.
- prepare(contiguous: bool = True) None#
Prepare source/target buffer.
Source side performs (in order): slice → in-place all-reduce on Partial sub-groups (in source dtype) → cast to
transfer_dtype. Reducing before the cast matches DTensorPartial → Replicatesemantics; running the all-reduce in the (possibly lower-precision) wire dtype would change numerical results.
- property bucket_key: tuple#
Return bucket grouping key.
cell_keyis added for SHADOW Partial chunks only — they all sharedst_ranks=()and would otherwise bundle across cells, making the bucket’s all_reduce sequence per-rank-specific and out of sync with peer ranks whose matching cells live in separate buckets. PRIMARY chunks already differ bydst_ranksacross cells.
- tensor: torch.Tensor | None = None#
- transfer_dtype: torch.dtype | None = None#
- etha.comm.ir.logger#