maxent_grpo.training.run_helpers¶
Shared helper utilities for the MaxEnt-GRPO training pipeline.
This module re-exports common runtime dependency/prompt helpers while keeping
lightweight tensor utilities used in scoring and loss computation. Logging
helpers now live in maxent_grpo.training.runtime.logging.
Functions
|
Tokenize prompt+completion pairs and return tensors + prompt lengths. |
|
Numerically stable softmax with optional temperature and epsilon floor. |
|
Create labels tensor with prompt tokens masked as -100 for CE. |
- maxent_grpo.training.run_helpers.truncate_prompt(prompt, char_limit=None, *, tokenizer=None, max_tokens=None)[source]¶
Clamp prompt strings to a safe token length when possible.
- Parameters:
prompt (str) – Prompt string to clamp.
char_limit (int | None) – Optional character limit fallback. When
Nonethe module-levelPROMPT_CHAR_LIMITis used.tokenizer (Any | None) – Optional tokenizer used to enforce token limits.
max_tokens (int | None) – Optional token limit override (preferred when tokenizer is available).
- Returns:
The original prompt when under the limit, otherwise a truncated prefix.
- Return type:
- maxent_grpo.training.run_helpers.require_accelerator(context)[source]¶
Return accelerate.Accelerator or raise a helpful RuntimeError.
- maxent_grpo.training.run_helpers.require_dataloader(context)[source]¶
Return torch.utils.data.DataLoader with a descriptive error on failure.
- maxent_grpo.training.run_helpers.require_torch(context)[source]¶
Return the torch module or raise a helpful RuntimeError.
- maxent_grpo.training.run_helpers.require_transformer_base_classes(context)[source]¶
Return (PreTrainedModel, PreTrainedTokenizer) with clear failure messages.