maxent_grpo.training.runtime¶
Runtime utilities split by concern for the MaxEnt-GRPO training stack.
This package separates setup/dependency loading, logging, and prompt handling so callers can import only what they need without pulling the full helper module.
- maxent_grpo.training.runtime.log_run_header(training_args=None)[source]¶
Log a consistent run header with recipe and resolved method identity.
- maxent_grpo.training.runtime.resolve_run_metadata(training_args=None)[source]¶
Return run-level metadata for logging consistency.
- class maxent_grpo.training.runtime.ChatTokenizer(*args, **kwargs)[source]¶
Bases:
ProtocolProtocol for tokenizers with chat template capabilities.
- class maxent_grpo.training.runtime.GenerationPenaltyPassthroughMixin[source]¶
Bases:
objectExpose penalty overrides via legacy
gen_*accessors.- penalty: GenerationPenaltyConfig¶
- class maxent_grpo.training.runtime.GenerationSamplingConfig(max_prompt_len, max_completion_len, gen_temperature, gen_top_p, use_vllm, vllm, *, vllm_mode='server')[source]¶
Bases:
objectShared completion sampling knobs (HF + vLLM).
- Parameters:
- vllm: VLLMClientConfig¶
- property vllm_frequency_penalty: float¶
Backward-compatible accessor for the frequency penalty value.
- property vllm_include_stop_str_in_output: bool¶
Whether vLLM should preserve matched stop strings in output text.
- property vllm_backoff_multiplier: float¶
Multiplier applied to the backoff delay after each attempt.
- property vllm_guided_json: str | None¶
Backward-compatible accessor for JSON schema-guided decoding.
- class maxent_grpo.training.runtime.MaxEntOptions(tau=<factory>, q_temperature=<factory>, q_epsilon=<factory>, length_normalize_ref=<factory>)[source]¶
Bases:
objectLightweight knobs specific to MaxEnt sequence-level updates.
- maxent_grpo.training.runtime.classify_vllm_startup_log(log_text, stall_threshold=3)[source]¶
Classify startup progress using marker patterns in
log_text.- Parameters:
- Return type:
- maxent_grpo.training.runtime.should_trigger_v0_fallback(log_text, attempt, min_attempts=20, stall_threshold=3)[source]¶
Return True when vLLM startup appears stuck and should be relaunched in V0 mode.
- maxent_grpo.training.runtime.get_trl_prepare_deepspeed()[source]¶
Return TRL’s prepare_deepspeed helper when available.
- Return type:
Any | None
- maxent_grpo.training.runtime.require_accelerator(context)[source]¶
Return accelerate.Accelerator or raise a helpful RuntimeError.
- maxent_grpo.training.runtime.require_dataloader(context)[source]¶
Return torch.utils.data.DataLoader with a descriptive error on failure.
- maxent_grpo.training.runtime.require_deepspeed(context, module='deepspeed')[source]¶
Return a DeepSpeed module import or raise a contextual RuntimeError.
- maxent_grpo.training.runtime.require_torch(context)[source]¶
Return the torch module or raise a helpful RuntimeError.
- maxent_grpo.training.runtime.require_transformer_base_classes(context)[source]¶
Return (PreTrainedModel, PreTrainedTokenizer) with clear failure messages.
- maxent_grpo.training.runtime.truncate_prompt(prompt, char_limit=None, *, tokenizer=None, max_tokens=None)[source]¶
Clamp prompt strings to a safe token length when possible.
- Parameters:
prompt (str) – Prompt string to clamp.
char_limit (int | None) – Optional character limit fallback. When
Nonethe module-levelPROMPT_CHAR_LIMITis used.tokenizer (Any | None) – Optional tokenizer used to enforce token limits.
max_tokens (int | None) – Optional token limit override (preferred when tokenizer is available).
- Returns:
The original prompt when under the limit, otherwise a truncated prefix.
- Return type:
Modules
Configuration dataclasses for the training runtime. |
|
DeepSpeed and Accelerate integration helpers. |
|
Dependency loading utilities used by the training runtime. |
|
Logging utilities (primarily W&B) for the training stack. |
|
Runtime operational helpers. |
|
Prompt-related helpers and sampling penalties. |
|
Setup utilities for loading runtime dependencies and accelerator plugins. |