maxent_grpo.training.runtime.config¶

Configuration dataclasses for the training runtime.

Classes

`GenerationSamplingConfig`(max_prompt_len, ...)	Shared completion sampling knobs (HF + vLLM).
`MaxEntOptions`([tau, q_temperature, ...])	Lightweight knobs specific to MaxEnt sequence-level updates.
`VLLMClientConfig`(url, rounds_cfg, ...[, ...])	Configuration for vLLM-backed completion generation with all exposed knobs.

class maxent_grpo.training.runtime.config.GenerationSamplingConfig(max_prompt_len, max_completion_len, gen_temperature, gen_top_p, use_vllm, vllm, *, vllm_mode='server')[source]¶

Bases: object

Shared completion sampling knobs (HF + vLLM).

Parameters:

max_prompt_len (int)
max_completion_len (int)
gen_temperature (float)
gen_top_p (float)
use_vllm (bool)
vllm (VLLMClientConfig)
vllm_mode (str)

max_prompt_len: int¶

max_completion_len: int¶

gen_temperature: float¶

gen_top_p: float¶

use_vllm: bool¶

vllm: VLLMClientConfig¶

vllm_mode: str = 'server'¶

property vllm_url: str¶: Backward-compatible accessor for the vLLM endpoint URL.

property vllm_rounds_cfg: int¶: Backward-compatible accessor for the maximum vLLM retry rounds.

property vllm_retry_sleep: float¶: Backward-compatible accessor for the per-round retry sleep.

property vllm_backfill_local: bool¶: Backward-compatible accessor for local fallback behavior.

property vllm_request_logprobs: bool¶: Backward-compatible accessor for whether to request logprobs.

property vllm_best_of: int | None¶: Backward-compatible accessor for the best-of sampling count.

property vllm_frequency_penalty: float¶: Backward-compatible accessor for the frequency penalty value.

property vllm_presence_penalty: float¶: Backward-compatible accessor for the presence penalty value.

property vllm_top_k: int | None¶: Backward-compatible accessor for the top-k sampling limit.

property vllm_stop_sequences: List[str] | None¶: Backward-compatible accessor for stop sequences.

property vllm_include_stop_str_in_output: bool¶: Whether vLLM should preserve matched stop strings in output text.

property vllm_timeout: float¶: Backward-compatible accessor for request timeout.

property vllm_max_retries: int¶: Backward-compatible accessor for maximum request retries.

property vllm_backoff: float¶: Backward-compatible accessor for exponential backoff factor.

property vllm_backoff_multiplier: float¶: Multiplier applied to the backoff delay after each attempt.

property vllm_guided_json: str | None¶: Backward-compatible accessor for JSON schema-guided decoding.

property vllm_guided_regex: str | None¶: Backward-compatible accessor for regex-guided decoding.

property vllm_logit_bias: Dict[str, float] | None¶: Backward-compatible accessor for logit bias configuration.

property vllm_request_id_prefix: str | None¶: Backward-compatible accessor for request-id prefixes.

property vllm_sync_weights: bool¶: Whether to push model weights to the vLLM server before generation.

class maxent_grpo.training.runtime.config.MaxEntOptions(tau=<factory>, q_temperature=<factory>, q_epsilon=<factory>, length_normalize_ref=<factory>)[source]¶

Bases: object

Lightweight knobs specific to MaxEnt sequence-level updates.

Parameters:

tau (float)
q_temperature (float)
q_epsilon (float)
length_normalize_ref (bool)

tau: float¶

q_temperature: float¶

q_epsilon: float¶

length_normalize_ref: bool¶

class maxent_grpo.training.runtime.config.VLLMClientConfig(url, rounds_cfg, retry_sleep, backfill_local, request_logprobs, best_of=None, frequency_penalty=0.0, presence_penalty=0.0, top_k=None, stop_sequences=None, include_stop_str_in_output=False, timeout=120.0, max_retries=3, backoff=1.0, backoff_multiplier=2.0, guided_json=None, guided_regex=None, logit_bias=None, request_id_prefix=None, sync_weights=False)[source]¶

Bases: object

Configuration for vLLM-backed completion generation with all exposed knobs.

Parameters:

url (str)
rounds_cfg (int)
retry_sleep (float)
backfill_local (bool)
request_logprobs (bool)
best_of (int | None)
frequency_penalty (float)
presence_penalty (float)
top_k (int | None)
stop_sequences (List[str] | None)
include_stop_str_in_output (bool)
timeout (float)
max_retries (int)
backoff (float)
backoff_multiplier (float)
guided_json (str | None)
guided_regex (str | None)
logit_bias (Dict[str, float] | None)
request_id_prefix (str | None)
sync_weights (bool)

url: str¶

rounds_cfg: int¶

retry_sleep: float¶

backfill_local: bool¶

request_logprobs: bool¶

best_of: int | None = None¶

frequency_penalty: float = 0.0¶

presence_penalty: float = 0.0¶

top_k: int | None = None¶

stop_sequences: List[str] | None = None¶

include_stop_str_in_output: bool = False¶

timeout: float = 120.0¶

max_retries: int = 3¶

backoff: float = 1.0¶

backoff_multiplier: float = 2.0¶

guided_json: str | None = None¶

guided_regex: str | None = None¶

logit_bias: Dict[str, float] | None = None¶

request_id_prefix: str | None = None¶

sync_weights: bool = False¶