maxent_grpo.training.run_helpers

Shared helper utilities for the MaxEnt-GRPO training pipeline.

This module re-exports common runtime dependency/prompt helpers while keeping lightweight tensor utilities used in scoring and loss computation. Logging helpers now live in maxent_grpo.training.runtime.logging.

Functions

_batch_tokenize_pairs(tokenizer, prompts, ...)

Tokenize prompt+completion pairs and return tensors + prompt lengths.

_group_softmax(values[, temperature, eps])

Numerically stable softmax with optional temperature and epsilon floor.

_prepare_labels_for_ce(input_ids, prompt_lengths)

Create labels tensor with prompt tokens masked as -100 for CE.

maxent_grpo.training.run_helpers.truncate_prompt(prompt, char_limit=None, *, tokenizer=None, max_tokens=None)[source]

Clamp prompt strings to a safe token length when possible.

Parameters:
  • prompt (str) – Prompt string to clamp.

  • char_limit (int | None) – Optional character limit fallback. When None the module-level PROMPT_CHAR_LIMIT is used.

  • tokenizer (Any | None) – Optional tokenizer used to enforce token limits.

  • max_tokens (int | None) – Optional token limit override (preferred when tokenizer is available).

Returns:

The original prompt when under the limit, otherwise a truncated prefix.

Return type:

str

maxent_grpo.training.run_helpers.require_accelerator(context)[source]

Return accelerate.Accelerator or raise a helpful RuntimeError.

Parameters:

context (str)

Return type:

Any

maxent_grpo.training.run_helpers.require_dataloader(context)[source]

Return torch.utils.data.DataLoader with a descriptive error on failure.

Parameters:

context (str)

Return type:

Any

maxent_grpo.training.run_helpers.require_torch(context)[source]

Return the torch module or raise a helpful RuntimeError.

Parameters:

context (str)

Return type:

Any

maxent_grpo.training.run_helpers.require_transformer_base_classes(context)[source]

Return (PreTrainedModel, PreTrainedTokenizer) with clear failure messages.

Parameters:

context (str)

Return type:

Tuple[Any, Any]

maxent_grpo.training.run_helpers.require_deepspeed(context, module='deepspeed')[source]

Return a DeepSpeed module import or raise a contextual RuntimeError.

Parameters:
Return type:

Any

maxent_grpo.training.run_helpers.get_trl_prepare_deepspeed()[source]

Return TRL’s prepare_deepspeed helper when available.

Return type:

Any | None