maxent_grpo.training.trainer_hooks

Trainer helper hooks used by the active TRL/HF training path.

This module intentionally contains only helper utilities still used by CustomGRPOTrainer. Legacy custom-loop execution code lives nowhere in the runtime path.

Functions

_apply_weighting_overrides_from_config(ctx)

Apply non-controller weighting toggles from active training config.

_build_prompt_objective_entries(prepared, ...)

Return per-prompt summaries of reward, KL, and entropy.

_cache_meta_stats(weighting_cfg, ...)

Cache scalar summaries used by meta-controller objectives.

_entropy_from_probs(probs)

Return natural-log entropy for a probability vector.

_log_prompt_objective(ctx, prepared, step)

Emit per-prompt objective breakdown when requested.

_maybe_overwrite_controller_state_from_config(ctx)

Optionally force controller scalars to match recipe values.

_per_sequence_kl_values(scores, ref_stats, ...)

Return per-sequence KL estimates for prompt-level diagnostics.

_prompt_objective_logging_enabled(ctx)

Return True when per-prompt objective logging is explicitly enabled.

_prompt_preview(text)

Return a compact prompt preview for logs.

_to_cpu_tensor(value)

Best-effort conversion of tensor-like inputs to 1D CPU float tensors.