maxent_grpo.training.run_helpers¶

Shared helper utilities for the MaxEnt-GRPO training pipeline.

This module re-exports common runtime dependency/prompt helpers while keeping lightweight tensor utilities used in scoring and loss computation. Logging helpers now live in maxent_grpo.training.runtime.logging.

Functions

`_batch_tokenize_pairs`(tokenizer, prompts, ...)	Tokenize prompt+completion pairs and return tensors + prompt lengths.
`_group_softmax`(values[, temperature, eps])	Numerically stable softmax with optional temperature and epsilon floor.
`_prepare_labels_for_ce`(input_ids, prompt_lengths)	Create labels tensor with prompt tokens masked as -100 for CE.

maxent_grpo.training.run_helpers.truncate_prompt(prompt, char_limit=None, *, tokenizer=None, max_tokens=None)[source]¶

Clamp prompt strings to a safe token length when possible.

Parameters:

prompt (str) – Prompt string to clamp.
char_limit (int | None) – Optional character limit fallback. When None the module-level PROMPT_CHAR_LIMIT is used.
tokenizer (Any | None) – Optional tokenizer used to enforce token limits.
max_tokens (int | None) – Optional token limit override (preferred when tokenizer is available).

Returns:

The original prompt when under the limit, otherwise a truncated prefix.

Return type:

str

maxent_grpo.training.run_helpers.require_accelerator(context)[source]¶

Return accelerate.Accelerator or raise a helpful RuntimeError.

Parameters:: context (str)
Return type:: Any

maxent_grpo.training.run_helpers.require_dataloader(context)[source]¶

Return torch.utils.data.DataLoader with a descriptive error on failure.

Parameters:: context (str)
Return type:: Any

maxent_grpo.training.run_helpers.require_torch(context)[source]¶

Return the torch module or raise a helpful RuntimeError.

Parameters:: context (str)
Return type:: Any

maxent_grpo.training.run_helpers.require_transformer_base_classes(context)[source]¶

Return (PreTrainedModel, PreTrainedTokenizer) with clear failure messages.

Parameters:: context (str)
Return type:: Tuple[Any, Any]

maxent_grpo.training.run_helpers.require_deepspeed(context, module='deepspeed')[source]¶

Return a DeepSpeed module import or raise a contextual RuntimeError.

Parameters:

context (str)
module (str)

Return type:

Any

maxent_grpo.training.run_helpers.get_trl_prepare_deepspeed()[source]¶

Return TRL’s prepare_deepspeed helper when available.

Return type:: Any | None