maxent_grpo.training.runtime.prompts¶

Prompt-related helpers and sampling penalties.

Functions

`_prompt_char_limit_from_tokens`(max_prompt_len)	Return the token cap used for prompt truncation.
`_prompt_suffix_from_env`(env_var, default)	Resolve a prompt suffix from environment variables.
`_require_prompt_column`(example, prompt_column)	Raise if the configured prompt column is missing from a dataset row.
`_to_prompt`(example, tokenizer, ...[, ...])	Shared prompt/answer builder used across training pipelines.
`_truncate_prompt`(prompt[, char_limit, ...])	Clamp prompt strings to a safe token length when possible.
`append_eval_prompt_suffix`(prompt)	Append a short eval-only format reminder to the prompt.
`append_prompt_suffix`(prompt)	Append a format reminder to all prompts.
`sync_trunc_state`(state)	Merge external truncation state into the shared warning cache.
`truncate_prompt`(prompt[, char_limit, ...])	Clamp prompt strings to a safe token length when possible.

Classes

`ChatTokenizer`(args, *kwargs)	Protocol for tokenizers with chat template capabilities.
`GenerationPenaltyConfig`([gen_top_k, ...])	Shared penalty/stop sequence overrides for completion sampling.
`GenerationPenaltyPassthroughMixin`()	Expose penalty overrides via legacy `gen_*` accessors.

class maxent_grpo.training.runtime.prompts.ChatTokenizer(*args, **kwargs)[source]¶

Bases: Protocol

Protocol for tokenizers with chat template capabilities.

apply_chat_template(conversation, tokenize=True, add_generation_prompt=True)[source]¶

Render a conversation into a model-ready prompt.

Parameters:

conversation (List[Dict[str, str]])
tokenize (bool)
add_generation_prompt (bool)

Return type:

str | List[int]

property eos_token_id: int | None¶: Expose the EOS token id used by the tokenizer.

class maxent_grpo.training.runtime.prompts.GenerationPenaltyConfig(gen_top_k=None, gen_best_of=None, gen_frequency_penalty=0.0, gen_presence_penalty=0.0, gen_stop_sequences=None)[source]¶

Bases: object

Shared penalty/stop sequence overrides for completion sampling.

Parameters:

gen_top_k (int | None)
gen_best_of (int | None)
gen_frequency_penalty (float)
gen_presence_penalty (float)
gen_stop_sequences (List[str] | None)

gen_top_k: int | None = None¶

gen_best_of: int | None = None¶

gen_frequency_penalty: float = 0.0¶

gen_presence_penalty: float = 0.0¶

gen_stop_sequences: List[str] | None = None¶

class maxent_grpo.training.runtime.prompts.GenerationPenaltyPassthroughMixin[source]¶

Bases: object

Expose penalty overrides via legacy gen_* accessors.

penalty: GenerationPenaltyConfig¶

property gen_top_k: int | None¶: Backward-compatible alias for the top-k sampling limit.

property gen_best_of: int | None¶: Backward-compatible alias for the best-of sampling count.

property gen_frequency_penalty: float¶: Backward-compatible alias for the frequency penalty strength.

property gen_presence_penalty: float¶: Backward-compatible alias for the presence penalty strength.

property gen_stop_sequences: List[str] | None¶: Backward-compatible alias for stop sequences.

maxent_grpo.training.runtime.prompts.append_prompt_suffix(prompt)[source]¶

Append a format reminder to all prompts.

Parameters:: prompt (str)
Return type:: str

maxent_grpo.training.runtime.prompts.append_eval_prompt_suffix(prompt)[source]¶

Append a short eval-only format reminder to the prompt.

Parameters:: prompt (str)
Return type:: str

maxent_grpo.training.runtime.prompts.sync_trunc_state(state)[source]¶

Merge external truncation state into the shared warning cache.

Parameters:: state (Dict[str, Any]) – Dictionary of state keys to merge (e.g., {"warned": True}).
Returns:: None.
Return type:: None

maxent_grpo.training.runtime.prompts.truncate_prompt(prompt, char_limit=None, *, tokenizer=None, max_tokens=None)[source]¶

Clamp prompt strings to a safe token length when possible.

Parameters:

prompt (str) – Prompt string to clamp.
char_limit (int | None) – Optional character limit fallback. When None the module-level PROMPT_CHAR_LIMIT is used.
tokenizer (Any | None) – Optional tokenizer used to enforce token limits.
max_tokens (int | None) – Optional token limit override (preferred when tokenizer is available).

Returns:

The original prompt when under the limit, otherwise a truncated prefix.

Return type:

str