maxent_grpo.training.runtime.config¶
Configuration dataclasses for the training runtime.
Classes
|
Shared completion sampling knobs (HF + vLLM). |
|
Lightweight knobs specific to MaxEnt sequence-level updates. |
|
Configuration for vLLM-backed completion generation with all exposed knobs. |
- class maxent_grpo.training.runtime.config.GenerationSamplingConfig(max_prompt_len, max_completion_len, gen_temperature, gen_top_p, use_vllm, vllm, *, vllm_mode='server')[source]¶
Bases:
objectShared completion sampling knobs (HF + vLLM).
- Parameters:
- vllm: VLLMClientConfig¶
- property vllm_frequency_penalty: float¶
Backward-compatible accessor for the frequency penalty value.
- property vllm_include_stop_str_in_output: bool¶
Whether vLLM should preserve matched stop strings in output text.
- property vllm_backoff_multiplier: float¶
Multiplier applied to the backoff delay after each attempt.
- property vllm_guided_json: str | None¶
Backward-compatible accessor for JSON schema-guided decoding.
- class maxent_grpo.training.runtime.config.MaxEntOptions(tau=<factory>, q_temperature=<factory>, q_epsilon=<factory>, length_normalize_ref=<factory>)[source]¶
Bases:
objectLightweight knobs specific to MaxEnt sequence-level updates.
- class maxent_grpo.training.runtime.config.VLLMClientConfig(url, rounds_cfg, retry_sleep, backfill_local, request_logprobs, best_of=None, frequency_penalty=0.0, presence_penalty=0.0, top_k=None, stop_sequences=None, include_stop_str_in_output=False, timeout=120.0, max_retries=3, backoff=1.0, backoff_multiplier=2.0, guided_json=None, guided_regex=None, logit_bias=None, request_id_prefix=None, sync_weights=False)[source]¶
Bases:
objectConfiguration for vLLM-backed completion generation with all exposed knobs.
- Parameters:
url (str)
rounds_cfg (int)
retry_sleep (float)
backfill_local (bool)
request_logprobs (bool)
best_of (int | None)
frequency_penalty (float)
presence_penalty (float)
top_k (int | None)
include_stop_str_in_output (bool)
timeout (float)
max_retries (int)
backoff (float)
backoff_multiplier (float)
guided_json (str | None)
guided_regex (str | None)
request_id_prefix (str | None)
sync_weights (bool)