maxent_grpo.training.types.logging

Logging protocols and dataclasses shared across the training stack.

Classes

LogStepArtifacts(loss_outputs, diagnostics, ...)

Helper container for optimizer/loss diagnostics per step.

LoggingConfigView(weighting, clipping, schedule)

Pointers to configs referenced while logging.

LoggingHandles(metric_writer, ...[, ...])

Callbacks for logging and checkpointing.

MetricState(*args, **kwargs)

Minimal interface required for metric accumulation.

MetricWriter(*args, **kwargs)

Protocol describing a metric writer used by the training loop.

RewardComponentStats(mean, std)

Mean/std summary for an individual reward component.

RewardLoggingView(reward_mean, reward_std, ...)

Aggregated reward/advantage statistics for logging.

TokenUsageStats(avg_completion_tokens, ...)

Aggregate completion/input token statistics.

TrainingMetricsPayload(reward_stats, ...[, ...])

Container for scalar values used by the training logger.

TrainingScalarStats(ref_logp_mean, tokens, ...)

Scalar values that vary every logging step.

_MetricStepLogger(writer, step)

Small helper that binds a metric writer to a fixed step.

_NoopMetricLogger()

Logger used when metric logging is disabled.

class maxent_grpo.training.types.logging.LoggingConfigView(weighting, clipping, schedule)[source]

Bases: object

Pointers to configs referenced while logging.

Parameters:
weighting: WeightingSettings
clipping: ClipSettings
schedule: OptimizationSchedule
class maxent_grpo.training.types.logging.LoggingHandles(metric_writer, save_checkpoint, save_strategy, save_steps, wandb_run, checkpoint_state_ref=None)[source]

Bases: object

Callbacks for logging and checkpointing.

Parameters:
metric_writer: MetricWriter
save_checkpoint: Callable[[str], None]
save_strategy: str
save_steps: int
wandb_run: Any | None
checkpoint_state_ref: Dict[str, Any] | None = None
log_metrics(metrics, step)[source]

Send metrics to the configured writer.

Parameters:
  • metrics (dict[str, Any]) – Scalar payload to log.

  • step (int) – Current global training step.

Return type:

None

flush_metrics()[source]

Flush the writer when it exposes a flush method.

Return type:

None

step_logger(step, *, enabled=True)[source]

Yield a helper that logs metrics for a specific training step.

Parameters:
  • step (int) – Current training step being logged.

  • enabled (bool) – Disable logging when False (e.g., eval only).

Yields:

A helper exposing log for the provided step.

Return type:

Iterator[_MetricStepLogger | _NoopMetricLogger]

class maxent_grpo.training.types.logging.LogStepArtifacts(loss_outputs, diagnostics, grad_norm_scalar, epoch_progress)[source]

Bases: object

Helper container for optimizer/loss diagnostics per step.

Parameters:
loss_outputs: LossOutputs
diagnostics: BatchDiagnostics
grad_norm_scalar: float | None
epoch_progress: float
as_dict()[source]

Return a dict view useful for debugging/log statements.

Return type:

Dict[str, Any]

class maxent_grpo.training.types.logging.MetricState(*args, **kwargs)[source]

Bases: Protocol

Minimal interface required for metric accumulation.

global_step: int
num_input_tokens_seen: float
metric_sums: Dict[str, float]
metric_counts: Dict[str, int]
class maxent_grpo.training.types.logging.MetricWriter(*args, **kwargs)[source]

Bases: Protocol

Protocol describing a metric writer used by the training loop.

log(metrics, step)[source]

Record metrics for a training step.

Parameters:
Return type:

None

flush()[source]

Flush buffered metrics to their storage backend.

Return type:

None

class maxent_grpo.training.types.logging.RewardComponentStats(mean, std)[source]

Bases: object

Mean/std summary for an individual reward component.

Parameters:
mean: float
std: float
class maxent_grpo.training.types.logging.RewardLoggingView(reward_mean, reward_std, frac_zero_std, advantage_mean, advantage_std, advantage_count, per_reward, q_entropy_mean, q_entropy_std, q_entropy_min, q_entropy_max, semantic_entropy_mean=0.0, semantic_entropy_std=0.0, semantic_entropy_min=0.0, semantic_entropy_max=0.0, advantage_scale_mean=1.0, advantage_scale_min=1.0, advantage_scale_max=1.0, seed_alpha_effective=0.0, seed_max_possible_entropy=0.0, reward_quantiles=<factory>, per_reward_quantiles=<factory>)[source]

Bases: object

Aggregated reward/advantage statistics for logging.

Parameters:
reward_mean: float
reward_std: float
frac_zero_std: float
advantage_mean: float
advantage_std: float
advantage_count: int
per_reward: Dict[str, RewardComponentStats]
q_entropy_mean: float
q_entropy_std: float
q_entropy_min: float
q_entropy_max: float
semantic_entropy_mean: float = 0.0
semantic_entropy_std: float = 0.0
semantic_entropy_min: float = 0.0
semantic_entropy_max: float = 0.0
advantage_scale_mean: float = 1.0
advantage_scale_min: float = 1.0
advantage_scale_max: float = 1.0
seed_alpha_effective: float = 0.0
seed_max_possible_entropy: float = 0.0
reward_quantiles: Dict[str, float]
per_reward_quantiles: Dict[str, Dict[str, float]]
class maxent_grpo.training.types.logging.TokenUsageStats(avg_completion_tokens, num_completion_tokens, num_input_tokens)[source]

Bases: object

Aggregate completion/input token statistics.

Parameters:
  • avg_completion_tokens (float)

  • num_completion_tokens (float)

  • num_input_tokens (float)

avg_completion_tokens: float
num_completion_tokens: float
num_input_tokens: float
class maxent_grpo.training.types.logging.TrainingMetricsPayload(reward_stats, weight_stats, loss_outputs, diagnostics, length_stats, config, scalars, diversity_metrics=None)[source]

Bases: object

Container for scalar values used by the training logger.

Parameters:
reward_stats: RewardLoggingView
weight_stats: WeightLoggingView
loss_outputs: LossOutputs
diagnostics: BatchDiagnostics
length_stats: LengthStats
config: LoggingConfigView
scalars: TrainingScalarStats
diversity_metrics: Dict[str, float] | None = None
class maxent_grpo.training.types.logging.TrainingScalarStats(ref_logp_mean, tokens, current_lr, grad_norm_scalar, epoch_progress, vllm_latency_ms, policy_entropy=None, entropy_bonus_coef=None, entropy_bonus_reward_std=None)[source]

Bases: object

Scalar values that vary every logging step.

Parameters:
ref_logp_mean: float
tokens: TokenUsageStats
current_lr: float
grad_norm_scalar: float | None
epoch_progress: float
vllm_latency_ms: float | None
policy_entropy: float | None = None
entropy_bonus_coef: float | None = None
entropy_bonus_reward_std: float | None = None
property avg_completion_tokens: float

Return the average completion token length.

Returns:

Running average of completion token counts.

Return type:

float

property num_completion_tokens: float

Return the total completion token count processed.

Returns:

Total completion token count accumulated.

Return type:

float

property num_input_tokens: float

Return the total input token count processed.

Returns:

Total input token count accumulated.

Return type:

float