etha.tensor_bus.bootstrap#
Bootstrap utilities.
Attributes#
Classes#
Information about the bootstrap process. |
Functions#
|
Bootstrap TensorBusClient with automatic agent rank resolution. |
Setup ptrace authorization for agent process. |
Module Contents#
- class etha.tensor_bus.bootstrap.BootstrapInfo#
Information about the bootstrap process.
- Variables:
agent_rank – The agent rank this worker is connected to
global_rank – Global rank within the worker process group (from torchrun)
rank_offset – Offset used for calculation (if applicable)
device – CUDA device string (e.g., “cuda:0”)
command_queue_path – Path to Agent’s CommandQueue LMDB
state_path – Path to Agent’s State LMDB
method – How agent_rank was determined (“direct” or “offset”)
- etha.tensor_bus.bootstrap.bootstrap_client(path_naming_fn: collections.abc.Callable[[int], tuple[str, str]] | None = None, connection_timeout: float = 30.0) tuple[etha.tensor_bus.client.TensorBusClient, BootstrapInfo]#
Bootstrap TensorBusClient with automatic agent rank resolution.
This function encapsulates the entire Worker-side bootstrap process: 1. Determines agent_rank from environment variables (AGENT_RANK or LOCAL_RANK + OFFSET) 2. Resolves LMDB paths using naming convention 3. Creates and returns TensorBusClient
Environment Variables (priority order): 1. AGENT_RANK: Direct specification (highest priority) 2. LOCAL_RANK + AGENT_RANK_OFFSET: Offset-based calculation
- Parameters:
path_naming_fn – Optional custom function to get (cmd_queue_path, state_path) from rank. If None, uses default convention: - /tmp/agent_rank{N}_command.lmdb - /tmp/agent_rank{N}_state.lmdb
connection_timeout – Max time to wait for Agent connection (seconds, default 30.0)
- Returns:
Tuple of TensorBusClient and BootstrapInfo
- Return type:
(client, info)
- Raises:
ValueError – If required environment variables are missing
ConnectionError – If Agent is not found or not responding
- etha.tensor_bus.bootstrap.setup_ptrace()#
Setup ptrace authorization for agent process.
For simplicity, we use PR_SET_PTRACER_ANY in this prototype. In production, you’d authorize specific agent PIDs.
- etha.tensor_bus.bootstrap.logger#