etha.comm.ir
============

.. py:module:: etha.comm.ir

.. autoapi-nested-parse::

   Intermediate Representation for tensor transfer operations.


Attributes
----------

.. autoapisummary::

   etha.comm.ir.logger


Classes
-------

.. autoapisummary::

   etha.comm.ir.Bucket
   etha.comm.ir.BucketEntry
   etha.comm.ir.Chunk


Module Contents
---------------

.. py:class:: Bucket

   Bases: :py:obj:`etha.comm.transfer.Transferable`


   Bucket for transfer operations (byte-based buffer).


   .. py:method:: finalize() -> None

      Finalize communication and cleanup.


   .. py:method:: is_complete() -> bool

      Check if communication is complete.

      :returns: True if complete, False otherwise.


   .. py:method:: launch() -> bool

      Launch communication operation.

      :returns: True if launched, False if still waiting for buffer to be ready.


   .. py:method:: prepare() -> None

      Prepare buffer for communication.

      Source-side Partial reduce and dtype cast both live inside
      ``Chunk.prepare``; this method only assembles entries into the
      bucket buffer.


   .. py:attribute:: buffer_ready_event
      :type:  torch.cuda.Event | None
      :value: None


   .. py:attribute:: device
      :type:  torch.device | None
      :value: None


   .. py:attribute:: entries
      :type:  list[BucketEntry]


   .. py:attribute:: key
      :type:  tuple


   .. py:attribute:: total_bytes
      :type:  int


.. py:class:: BucketEntry

   Bucket offset entry (byte-based).


   .. py:attribute:: chunk
      :type:  Chunk


   .. py:attribute:: nbytes
      :type:  int


   .. py:attribute:: offset
      :type:  int


.. py:class:: Chunk

   Bases: :py:obj:`etha.comm.transfer.Transferable`


   Unified chunk for all transfer operations.


   .. py:method:: finalize() -> None

      Finalize communication and cleanup.


   .. py:method:: prepare(contiguous: bool = True) -> None

      Prepare source/target buffer.

      Source side performs (in order): slice → in-place all-reduce on
      Partial sub-groups (in source dtype) → cast to ``transfer_dtype``.
      Reducing before the cast matches DTensor ``Partial → Replicate``
      semantics; running the all-reduce in the (possibly lower-precision)
      wire dtype would change numerical results.


   .. py:property:: bucket_key
      :type: tuple


      Return bucket grouping key.

      ``cell_key`` is added for SHADOW Partial chunks only — they all share
      ``dst_ranks=()`` and would otherwise bundle across cells, making the
      bucket's all_reduce sequence per-rank-specific and out of sync with
      peer ranks whose matching cells live in separate buckets. PRIMARY
      chunks already differ by ``dst_ranks`` across cells.


   .. py:attribute:: chunk_shape
      :type:  tuple[int, Ellipsis]


   .. py:attribute:: dst_idx
      :type:  tuple | None
      :value: None


   .. py:attribute:: slice_tuples
      :type:  tuple[slice, Ellipsis]
      :value: ()


   .. py:attribute:: source_partial_groups
      :type:  list[tuple[torch.distributed.ProcessGroup, str]] | None
      :value: None


   .. py:attribute:: src_idx
      :type:  tuple


   .. py:attribute:: src_slice_tuples
      :type:  tuple[slice, Ellipsis] | None
      :value: None


   .. py:attribute:: tensor
      :type:  torch.Tensor | None
      :value: None


   .. py:attribute:: transfer_dtype
      :type:  torch.dtype | None
      :value: None


.. py:data:: logger