maxent_grpo.training.types.logging¶
Logging protocols and dataclasses shared across the training stack.
Classes
|
Helper container for optimizer/loss diagnostics per step. |
|
Pointers to configs referenced while logging. |
|
Callbacks for logging and checkpointing. |
|
Minimal interface required for metric accumulation. |
|
Protocol describing a metric writer used by the training loop. |
|
Mean/std summary for an individual reward component. |
|
Aggregated reward/advantage statistics for logging. |
|
Aggregate completion/input token statistics. |
|
Container for scalar values used by the training logger. |
|
Scalar values that vary every logging step. |
|
Small helper that binds a metric writer to a fixed step. |
|
Logger used when metric logging is disabled. |
- class maxent_grpo.training.types.logging.LoggingConfigView(weighting, clipping, schedule)[source]¶
Bases:
objectPointers to configs referenced while logging.
- Parameters:
weighting (WeightingSettings)
clipping (ClipSettings)
schedule (OptimizationSchedule)
- weighting: WeightingSettings¶
- clipping: ClipSettings¶
- schedule: OptimizationSchedule¶
- class maxent_grpo.training.types.logging.LoggingHandles(metric_writer, save_checkpoint, save_strategy, save_steps, wandb_run, checkpoint_state_ref=None)[source]¶
Bases:
objectCallbacks for logging and checkpointing.
- Parameters:
- metric_writer: MetricWriter¶
- class maxent_grpo.training.types.logging.LogStepArtifacts(loss_outputs, diagnostics, grad_norm_scalar, epoch_progress)[source]¶
Bases:
objectHelper container for optimizer/loss diagnostics per step.
- Parameters:
loss_outputs (LossOutputs)
diagnostics (BatchDiagnostics)
grad_norm_scalar (float | None)
epoch_progress (float)
- loss_outputs: LossOutputs¶
- diagnostics: BatchDiagnostics¶
- class maxent_grpo.training.types.logging.MetricState(*args, **kwargs)[source]¶
Bases:
ProtocolMinimal interface required for metric accumulation.
- class maxent_grpo.training.types.logging.MetricWriter(*args, **kwargs)[source]¶
Bases:
ProtocolProtocol describing a metric writer used by the training loop.
- class maxent_grpo.training.types.logging.RewardComponentStats(mean, std)[source]¶
Bases:
objectMean/std summary for an individual reward component.
- class maxent_grpo.training.types.logging.RewardLoggingView(reward_mean, reward_std, frac_zero_std, advantage_mean, advantage_std, advantage_count, per_reward, q_entropy_mean, q_entropy_std, q_entropy_min, q_entropy_max, semantic_entropy_mean=0.0, semantic_entropy_std=0.0, semantic_entropy_min=0.0, semantic_entropy_max=0.0, advantage_scale_mean=1.0, advantage_scale_min=1.0, advantage_scale_max=1.0, seed_alpha_effective=0.0, seed_max_possible_entropy=0.0, reward_quantiles=<factory>, per_reward_quantiles=<factory>)[source]¶
Bases:
objectAggregated reward/advantage statistics for logging.
- Parameters:
reward_mean (float)
reward_std (float)
frac_zero_std (float)
advantage_mean (float)
advantage_std (float)
advantage_count (int)
per_reward (Dict[str, RewardComponentStats])
q_entropy_mean (float)
q_entropy_std (float)
q_entropy_min (float)
q_entropy_max (float)
semantic_entropy_mean (float)
semantic_entropy_std (float)
semantic_entropy_min (float)
semantic_entropy_max (float)
advantage_scale_mean (float)
advantage_scale_min (float)
advantage_scale_max (float)
seed_alpha_effective (float)
seed_max_possible_entropy (float)
- per_reward: Dict[str, RewardComponentStats]¶
- class maxent_grpo.training.types.logging.TokenUsageStats(avg_completion_tokens, num_completion_tokens, num_input_tokens)[source]¶
Bases:
objectAggregate completion/input token statistics.
- class maxent_grpo.training.types.logging.TrainingMetricsPayload(reward_stats, weight_stats, loss_outputs, diagnostics, length_stats, config, scalars, diversity_metrics=None)[source]¶
Bases:
objectContainer for scalar values used by the training logger.
- Parameters:
reward_stats (RewardLoggingView)
weight_stats (WeightLoggingView)
loss_outputs (LossOutputs)
diagnostics (BatchDiagnostics)
length_stats (LengthStats)
config (LoggingConfigView)
scalars (TrainingScalarStats)
- reward_stats: RewardLoggingView¶
- weight_stats: WeightLoggingView¶
- loss_outputs: LossOutputs¶
- diagnostics: BatchDiagnostics¶
- length_stats: LengthStats¶
- config: LoggingConfigView¶
- scalars: TrainingScalarStats¶
- class maxent_grpo.training.types.logging.TrainingScalarStats(ref_logp_mean, tokens, current_lr, grad_norm_scalar, epoch_progress, vllm_latency_ms, policy_entropy=None, entropy_bonus_coef=None, entropy_bonus_reward_std=None)[source]¶
Bases:
objectScalar values that vary every logging step.
- Parameters:
- tokens: TokenUsageStats¶
- property avg_completion_tokens: float¶
Return the average completion token length.
- Returns:
Running average of completion token counts.
- Return type: