maxent_grpo.training.generation.vllm_utils

Shared vLLM helper utilities reused across generation modules.

Functions

_env_flag(name, default)

_is_already_initialized_error(exc)

Return True when vLLM reports an already-initialized weight-sync group.

_is_loopback_host(base_url)

_resolve_async_mode(async_mode, base_url)

import_vllm_client_cls([import_fn])

Return TRL's VLLMClient class if available.

init_vllm_client_communicator(client, *[, ...])

Initialize the vLLM weight-sync communicator with an async-safe handshake.

zero3_gather_factory(accelerator[, import_fn])

Return a callable that gathers parameters when ZeRO-3 is active.

maxent_grpo.training.generation.vllm_utils.import_vllm_client_cls(import_fn=None)[source]

Return TRL’s VLLMClient class if available.

Parameters:

import_fn (Callable[[str], Any] | None) – Optional import helper to load TRL modules.

Returns:

VLLMClient class when import succeeds, otherwise None.

Return type:

type | None

maxent_grpo.training.generation.vllm_utils.init_vllm_client_communicator(client, *, async_mode=None, timeout_s=None, log=None, init_fn=None)[source]

Initialize the vLLM weight-sync communicator with an async-safe handshake.

The TRL client performs a blocking POST before joining the NCCL group, which can deadlock when the server waits for the client to join. This helper sends the POST in a background thread, then joins the NCCL group immediately.

Parameters:
  • client (Any) – TRL VLLMClient instance.

  • async_mode (bool | None) – Whether to use the async handshake. When None, the MAXENT_VLLM_ASYNC_INIT env var controls the behavior (default False).

  • timeout_s (float | None) – Timeout for the POST and join wait. Defaults to MAXENT_VLLM_INIT_TIMEOUT_S or 60 seconds.

  • log (Callable[[str], None] | None) – Optional logger callback for info messages.

  • init_fn (Callable[[], Any] | None)

Return type:

None

maxent_grpo.training.generation.vllm_utils.zero3_gather_factory(accelerator, import_fn=None)[source]

Return a callable that gathers parameters when ZeRO-3 is active.

Parameters:
  • accelerator (Any) – Accelerate object exposing state.deepspeed_plugin.

  • import_fn (Callable[[str], Any] | None) – Optional import helper used to lazily import deepspeed.

Returns:

Callable that wraps a parameter sequence in a gather context manager, or a no-op nullcontext when ZeRO-3 is not active.

Return type:

Callable[[Sequence[Any]], AbstractContextManager[Any]]