maxent_grpo.training.telemetry.trl_logging¶
Lightweight logging helpers to mirror MaxEnt metrics inside TRL trainers.
These utilities attach a small mixin to the GRPOTrainer so per-step logs also include the tau/beta and controller diagnostics used by the custom MaxEnt loop. The helpers are dependency-light and tolerate missing transformer/TRL pieces so unit tests can exercise them with SimpleNamespace stubs.
Functions
|
Mirror base loss/KL logs under train-prefixed keys for consistency. |
|
Add canonical metric aliases so GRPO/MaxEnt share one key schema. |
|
Clamp and normalize TRL's negative clipped_ratio counts into a [0, 1] ratio. |
|
Sanitize in-memory _metrics before GRPOTrainer aggregates them. |
|
Inject loss sub-components captured from compute_loss into metrics. |
|
Return a copy of metrics with bare keys moved under train/ or eval/. |
|
Return a finite float or |
|
Helper to attach a prefix if not already present. |
|
Wrap a Trainer subclass to include weighting metric logging once. |
Classes
|
Normalize/log metrics even if a trainer bypasses the log override. |
|
Mixin that injects weighting metrics into Trainer.log. |
|
Helper that derives tau/beta metrics from a trainer + its args. |