maxent_grpo.training.trainer_hooks¶

Trainer helper hooks used by the active TRL/HF training path.

This module intentionally contains only helper utilities still used by CustomGRPOTrainer. Legacy custom-loop execution code lives nowhere in the runtime path.

Functions

`_apply_weighting_overrides_from_config`(ctx)	Apply non-controller weighting toggles from active training config.
`_build_prompt_objective_entries`(prepared, ...)	Return per-prompt summaries of reward, KL, and entropy.
`_cache_meta_stats`(weighting_cfg, ...)	Cache scalar summaries used by meta-controller objectives.
`_entropy_from_probs`(probs)	Return natural-log entropy for a probability vector.
`_log_prompt_objective`(ctx, prepared, step)	Emit per-prompt objective breakdown when requested.
`_maybe_overwrite_controller_state_from_config`(ctx)	Optionally force controller scalars to match recipe values.
`_per_sequence_kl_values`(scores, ref_stats, ...)	Return per-sequence KL estimates for prompt-level diagnostics.
`_prompt_objective_logging_enabled`(ctx)	Return `True` when per-prompt objective logging is explicitly enabled.
`_prompt_preview`(text)	Return a compact prompt preview for logs.
`_to_cpu_tensor`(value)	Best-effort conversion of tensor-like inputs to 1D CPU float tensors.