maxent_grpo.training.telemetry.trl_logging

Lightweight logging helpers to mirror MaxEnt metrics inside TRL trainers.

These utilities attach a small mixin to the GRPOTrainer so per-step logs also include the tau/beta and controller diagnostics used by the custom MaxEnt loop. The helpers are dependency-light and tolerate missing transformer/TRL pieces so unit tests can exercise them with SimpleNamespace stubs.

Functions

_augment_loss_metrics(metrics)

Mirror base loss/KL logs under train-prefixed keys for consistency.

_canonicalize_rollout_metric_keys(metrics)

Add canonical metric aliases so GRPO/MaxEnt share one key schema.

_fix_clipped_ratio(metrics, args)

Clamp and normalize TRL's negative clipped_ratio counts into a [0, 1] ratio.

_fix_clipped_ratio_metrics(trainer)

Sanitize in-memory _metrics before GRPOTrainer aggregates them.

_merge_loss_components_from_trainer(metrics, ...)

Inject loss sub-components captured from compute_loss into metrics.

_normalize_prefixes(metrics[, is_eval])

Return a copy of metrics with bare keys moved under train/ or eval/.

_numeric_or_none(value)

Return a finite float or None when conversion fails.

_with_prefix(prefix, key)

Helper to attach a prefix if not already present.

ensure_weighting_logging(trainer_cls)

Wrap a Trainer subclass to include weighting metric logging once.

Classes

_WeightingLogCallback()

Normalize/log metrics even if a trainer bypasses the log override.

_WeightingLoggingMixin()

Mixin that injects weighting metrics into Trainer.log.

_WeightingMetricHelper(args)

Helper that derives tau/beta metrics from a trainer + its args.

maxent_grpo.training.telemetry.trl_logging.ensure_weighting_logging(trainer_cls)[source]

Wrap a Trainer subclass to include weighting metric logging once.

Parameters:

trainer_cls (type) – Trainer class (or callable) to wrap.

Returns:

Wrapped trainer class emitting normalized weighting metrics.

Return type:

type