maxent_grpo.training.runtime.prompts

Prompt-related helpers and sampling penalties.

Functions

_prompt_char_limit_from_tokens(max_prompt_len)

Return the token cap used for prompt truncation.

_prompt_suffix_from_env(env_var, default)

Resolve a prompt suffix from environment variables.

_require_prompt_column(example, prompt_column)

Raise if the configured prompt column is missing from a dataset row.

_to_prompt(example, tokenizer, ...[, ...])

Shared prompt/answer builder used across training pipelines.

_truncate_prompt(prompt[, char_limit, ...])

Clamp prompt strings to a safe token length when possible.

append_eval_prompt_suffix(prompt)

Append a short eval-only format reminder to the prompt.

append_prompt_suffix(prompt)

Append a format reminder to all prompts.

sync_trunc_state(state)

Merge external truncation state into the shared warning cache.

truncate_prompt(prompt[, char_limit, ...])

Clamp prompt strings to a safe token length when possible.

Classes

ChatTokenizer(*args, **kwargs)

Protocol for tokenizers with chat template capabilities.

GenerationPenaltyConfig([gen_top_k, ...])

Shared penalty/stop sequence overrides for completion sampling.

GenerationPenaltyPassthroughMixin()

Expose penalty overrides via legacy gen_* accessors.

class maxent_grpo.training.runtime.prompts.ChatTokenizer(*args, **kwargs)[source]

Bases: Protocol

Protocol for tokenizers with chat template capabilities.

apply_chat_template(conversation, tokenize=True, add_generation_prompt=True)[source]

Render a conversation into a model-ready prompt.

Parameters:
Return type:

str | List[int]

property eos_token_id: int | None

Expose the EOS token id used by the tokenizer.

class maxent_grpo.training.runtime.prompts.GenerationPenaltyConfig(gen_top_k=None, gen_best_of=None, gen_frequency_penalty=0.0, gen_presence_penalty=0.0, gen_stop_sequences=None)[source]

Bases: object

Shared penalty/stop sequence overrides for completion sampling.

Parameters:
  • gen_top_k (int | None)

  • gen_best_of (int | None)

  • gen_frequency_penalty (float)

  • gen_presence_penalty (float)

  • gen_stop_sequences (List[str] | None)

gen_top_k: int | None = None
gen_best_of: int | None = None
gen_frequency_penalty: float = 0.0
gen_presence_penalty: float = 0.0
gen_stop_sequences: List[str] | None = None
class maxent_grpo.training.runtime.prompts.GenerationPenaltyPassthroughMixin[source]

Bases: object

Expose penalty overrides via legacy gen_* accessors.

penalty: GenerationPenaltyConfig
property gen_top_k: int | None

Backward-compatible alias for the top-k sampling limit.

property gen_best_of: int | None

Backward-compatible alias for the best-of sampling count.

property gen_frequency_penalty: float

Backward-compatible alias for the frequency penalty strength.

property gen_presence_penalty: float

Backward-compatible alias for the presence penalty strength.

property gen_stop_sequences: List[str] | None

Backward-compatible alias for stop sequences.

maxent_grpo.training.runtime.prompts.append_prompt_suffix(prompt)[source]

Append a format reminder to all prompts.

Parameters:

prompt (str)

Return type:

str

maxent_grpo.training.runtime.prompts.append_eval_prompt_suffix(prompt)[source]

Append a short eval-only format reminder to the prompt.

Parameters:

prompt (str)

Return type:

str

maxent_grpo.training.runtime.prompts.sync_trunc_state(state)[source]

Merge external truncation state into the shared warning cache.

Parameters:

state (Dict[str, Any]) – Dictionary of state keys to merge (e.g., {"warned": True}).

Returns:

None.

Return type:

None

maxent_grpo.training.runtime.prompts.truncate_prompt(prompt, char_limit=None, *, tokenizer=None, max_tokens=None)[source]

Clamp prompt strings to a safe token length when possible.

Parameters:
  • prompt (str) – Prompt string to clamp.

  • char_limit (int | None) – Optional character limit fallback. When None the module-level PROMPT_CHAR_LIMIT is used.

  • tokenizer (Any | None) – Optional tokenizer used to enforce token limits.

  • max_tokens (int | None) – Optional token limit override (preferred when tokenizer is available).

Returns:

The original prompt when under the limit, otherwise a truncated prefix.

Return type:

str