maxent_grpo.training.generation.vllm_utils¶

Shared vLLM helper utilities reused across generation modules.

Functions

`_env_flag`(name, default)
`_is_already_initialized_error`(exc)	Return True when vLLM reports an already-initialized weight-sync group.
`_is_loopback_host`(base_url)
`_resolve_async_mode`(async_mode, base_url)
`import_vllm_client_cls`([import_fn])	Return TRL's VLLMClient class if available.
`init_vllm_client_communicator`(client, *[, ...])	Initialize the vLLM weight-sync communicator with an async-safe handshake.
`zero3_gather_factory`(accelerator[, import_fn])	Return a callable that gathers parameters when ZeRO-3 is active.

maxent_grpo.training.generation.vllm_utils.import_vllm_client_cls(import_fn=None)[source]¶

Return TRL’s VLLMClient class if available.

Parameters:: import_fn (Callable[[str], Any] | None) – Optional import helper to load TRL modules.
Returns:: VLLMClient class when import succeeds, otherwise None.
Return type:: type | None

maxent_grpo.training.generation.vllm_utils.init_vllm_client_communicator(client, *, async_mode=None, timeout_s=None, log=None, init_fn=None)[source]¶

Initialize the vLLM weight-sync communicator with an async-safe handshake.

The TRL client performs a blocking POST before joining the NCCL group, which can deadlock when the server waits for the client to join. This helper sends the POST in a background thread, then joins the NCCL group immediately.

Parameters:

client (Any) – TRL VLLMClient instance.
async_mode (bool | None) – Whether to use the async handshake. When None, the MAXENT_VLLM_ASYNC_INIT env var controls the behavior (default False).
timeout_s (float | None) – Timeout for the POST and join wait. Defaults to MAXENT_VLLM_INIT_TIMEOUT_S or 60 seconds.
log (Callable[[str], None] | None) – Optional logger callback for info messages.
init_fn (Callable[[], Any] | None)

Return type:

None

maxent_grpo.training.generation.vllm_utils.zero3_gather_factory(accelerator, import_fn=None)[source]¶

Return a callable that gathers parameters when ZeRO-3 is active.

Parameters:

accelerator (Any) – Accelerate object exposing state.deepspeed_plugin.
import_fn (Callable[[str], Any] | None) – Optional import helper used to lazily import deepspeed.

Returns:

Callable that wraps a parameter sequence in a gather context manager, or a no-op nullcontext when ZeRO-3 is not active.

Return type:

Callable[[Sequence[Any]], AbstractContextManager[Any]]