maxent_grpo.training.generation.vllm_utils¶
Shared vLLM helper utilities reused across generation modules.
Functions
|
|
|
Return True when vLLM reports an already-initialized weight-sync group. |
|
|
|
|
|
Return TRL's VLLMClient class if available. |
|
Initialize the vLLM weight-sync communicator with an async-safe handshake. |
|
Return a callable that gathers parameters when ZeRO-3 is active. |
- maxent_grpo.training.generation.vllm_utils.import_vllm_client_cls(import_fn=None)[source]¶
Return TRL’s VLLMClient class if available.
- maxent_grpo.training.generation.vllm_utils.init_vllm_client_communicator(client, *, async_mode=None, timeout_s=None, log=None, init_fn=None)[source]¶
Initialize the vLLM weight-sync communicator with an async-safe handshake.
The TRL client performs a blocking POST before joining the NCCL group, which can deadlock when the server waits for the client to join. This helper sends the POST in a background thread, then joins the NCCL group immediately.
- Parameters:
client (Any) – TRL VLLMClient instance.
async_mode (bool | None) – Whether to use the async handshake. When
None, theMAXENT_VLLM_ASYNC_INITenv var controls the behavior (default False).timeout_s (float | None) – Timeout for the POST and join wait. Defaults to
MAXENT_VLLM_INIT_TIMEOUT_Sor 60 seconds.log (Callable[[str], None] | None) – Optional logger callback for info messages.
- Return type:
None
- maxent_grpo.training.generation.vllm_utils.zero3_gather_factory(accelerator, import_fn=None)[source]¶
Return a callable that gathers parameters when ZeRO-3 is active.
- Parameters:
accelerator (Any) – Accelerate object exposing
state.deepspeed_plugin.import_fn (Callable[[str], Any] | None) – Optional import helper used to lazily import deepspeed.
- Returns:
Callable that wraps a parameter sequence in a gather context manager, or a no-op
nullcontextwhen ZeRO-3 is not active.- Return type:
Callable[[Sequence[Any]], AbstractContextManager[Any]]