maxent_grpo.training.telemetry.trl_logging¶

Lightweight logging helpers to mirror MaxEnt metrics inside TRL trainers.

These utilities attach a small mixin to the GRPOTrainer so per-step logs also include the tau/beta and controller diagnostics used by the custom MaxEnt loop. The helpers are dependency-light and tolerate missing transformer/TRL pieces so unit tests can exercise them with SimpleNamespace stubs.

Functions

`_augment_loss_metrics`(metrics)	Mirror base loss/KL logs under train-prefixed keys for consistency.
`_canonicalize_rollout_metric_keys`(metrics)	Add canonical metric aliases so GRPO/MaxEnt share one key schema.
`_fix_clipped_ratio`(metrics, args)	Clamp and normalize TRL's negative clipped_ratio counts into a [0, 1] ratio.
`_fix_clipped_ratio_metrics`(trainer)	Sanitize in-memory _metrics before GRPOTrainer aggregates them.
`_merge_loss_components_from_trainer`(metrics, ...)	Inject loss sub-components captured from compute_loss into metrics.
`_normalize_prefixes`(metrics[, is_eval])	Return a copy of metrics with bare keys moved under train/ or eval/.
`_numeric_or_none`(value)	Return a finite float or `None` when conversion fails.
`_with_prefix`(prefix, key)	Helper to attach a prefix if not already present.
`ensure_weighting_logging`(trainer_cls)	Wrap a Trainer subclass to include weighting metric logging once.

Classes

`_WeightingLogCallback`()	Normalize/log metrics even if a trainer bypasses the log override.
`_WeightingLoggingMixin`()	Mixin that injects weighting metrics into Trainer.log.
`_WeightingMetricHelper`(args)	Helper that derives tau/beta metrics from a trainer + its args.

maxent_grpo.training.telemetry.trl_logging.ensure_weighting_logging(trainer_cls)[source]¶

Wrap a Trainer subclass to include weighting metric logging once.

Parameters:: trainer_cls (type) – Trainer class (or callable) to wrap.
Returns:: Wrapped trainer class emitting normalized weighting metrics.
Return type:: type