maxent_grpo.training.trainer_hooks¶
Trainer helper hooks used by the active TRL/HF training path.
This module intentionally contains only helper utilities still used by
CustomGRPOTrainer. Legacy custom-loop execution code lives nowhere in the
runtime path.
Functions
|
Apply non-controller weighting toggles from active training config. |
|
Return per-prompt summaries of reward, KL, and entropy. |
|
Cache scalar summaries used by meta-controller objectives. |
|
Return natural-log entropy for a probability vector. |
|
Emit per-prompt objective breakdown when requested. |
|
Optionally force controller scalars to match recipe values. |
|
Return per-sequence KL estimates for prompt-level diagnostics. |
|
Return |
|
Return a compact prompt preview for logs. |
|
Best-effort conversion of tensor-like inputs to 1D CPU float tensors. |