maxent_grpo.training.weighting.types¶
Weighting-related dataclasses shared across the MaxEnt training loop.
Classes
|
Meta-controller knobs governing tau/beta adaptation. |
|
Serializable controller state describing tau/beta parameters. |
|
Controller settings for KL regularization. |
|
Softmax temperature and smoothing for weighting. |
|
Hyperparameters controlling tau adaptation. |
|
Torch-backed parameters for tau/beta with sync helpers. |
|
Aggregated entropy statistics for logging. |
|
Length-normalization flag and denominator scaling. |
|
Weights per completion and entropy diagnostics. |
|
Protocol for objects that carry controller weighting scalars. |
|
Sequence weighting hyperparameters with convenience accessors. |
- class maxent_grpo.training.weighting.types.TorchControllerState(torch_mod, tau_init, beta_init, *, requires_grad=False)[source]¶
Bases:
objectTorch-backed parameters for tau/beta with sync helpers.
- class maxent_grpo.training.weighting.types.WeightingConfigLike(*args, **kwargs)[source]¶
Bases:
ProtocolProtocol for objects that carry controller weighting scalars.
betaandtauare required. Optional attributes such asdenomortrain_grpo_objectivemay be present and are accessed viagetattr.
- class maxent_grpo.training.weighting.types.ControllerMetaSettings(enabled=False, method='analytic', learning_rate=0.0, tau_learning_rate=0.0, beta_learning_rate=0.0, beta_grad_clip=0.0, update_interval=1, objective='potential', analytic_steps=1, optimizer='sgd', truncation_steps=1, use_hessian=False, last_tau_grad=0.0, last_beta_grad=0.0)[source]¶
Bases:
objectMeta-controller knobs governing tau/beta adaptation.
- Parameters:
- class maxent_grpo.training.weighting.types.ControllerStateSnapshot(beta, tau, tau_log, tau_entropy_ema, meta=<factory>)[source]¶
Bases:
objectSerializable controller state describing tau/beta parameters.
- classmethod from_weighting(weighting_cfg)[source]¶
Build a controller snapshot from the active weighting settings.
- Parameters:
weighting_cfg (WeightingConfigLike)
- Return type:
- classmethod from_dict(payload)[source]¶
Instantiate a snapshot from a serialized payload.
- Parameters:
- Return type:
- apply_to_weighting(weighting_cfg)[source]¶
Apply the snapshot contents to a weighting configuration.
- Parameters:
weighting_cfg (WeightingConfigLike)
- Return type:
None
- class maxent_grpo.training.weighting.types.KlControllerSettings(target, horizon, step_size)[source]¶
Bases:
objectController settings for KL regularization.
- class maxent_grpo.training.weighting.types.QDistributionSettings(temperature, epsilon)[source]¶
Bases:
objectSoftmax temperature and smoothing for weighting.
- class maxent_grpo.training.weighting.types.TauSchedule(target_entropy, learning_rate, minimum_value, maximum_value, warmup_steps, target_entropy_start=None, target_entropy_final=None, target_entropy_horizon=0)[source]¶
Bases:
objectHyperparameters controlling tau adaptation.
- Parameters:
- class maxent_grpo.training.weighting.types.WeightLoggingView(entropy=0.0, entropy_norm=0.0, entropy_min=0.0, entropy_max=0.0, advantage_entropy_mean=0.0, advantage_entropy_std=0.0)[source]¶
Bases:
objectAggregated entropy statistics for logging.
- Parameters:
- class maxent_grpo.training.weighting.types.WeightNormalizationSettings(denom, len_norm_ref)[source]¶
Bases:
objectLength-normalization flag and denominator scaling.
- class maxent_grpo.training.weighting.types.WeightStats(weights_grouped, flat_weights, weight_entropy, weight_entropy_min, weight_entropy_max, advantage_entropy)[source]¶
Bases:
objectWeights per completion and entropy diagnostics.
- Parameters:
- class maxent_grpo.training.weighting.types.WeightingSettings(tau, beta, normalization, q_distribution, tau_schedule, kl_controller, train_grpo_objective, scale_rewards=True, controller_meta=<factory>, controller_state=None, allow_empty_weight_fallback=False)[source]¶
Bases:
objectSequence weighting hyperparameters with convenience accessors.
- Parameters:
tau (float)
beta (float)
normalization (WeightNormalizationSettings)
q_distribution (QDistributionSettings)
tau_schedule (TauSchedule)
kl_controller (KlControllerSettings)
train_grpo_objective (bool)
scale_rewards (bool)
controller_meta (ControllerMetaSettings)
controller_state (TorchControllerState | None)
allow_empty_weight_fallback (bool)
- normalization: WeightNormalizationSettings¶
- q_distribution: QDistributionSettings¶
- tau_schedule: TauSchedule¶
- kl_controller: KlControllerSettings¶
- controller_meta: ControllerMetaSettings¶
- controller_state: TorchControllerState | None = None¶
- property denom: float¶
Return the denominator used for weight normalization.
- Returns:
Normalization denominator applied to weights.
- Return type:
- property len_norm_ref: bool¶
Return whether reference log-probs are length-normalized.
- Returns:
Truewhen reference stats are length-normalized.- Return type:
- property q_temperature: float¶
Return the q-distribution temperature.
- Returns:
Temperature applied to the q-distribution softmax.
- Return type:
- property q_epsilon: float¶
Return the epsilon smoothing factor.
- Returns:
Epsilon smoothing applied to the q-distribution.
- Return type:
- property tau_target_entropy: float | None¶
Return the target weight entropy.
- Returns:
Desired entropy target (
Noneto disable adaptation).- Return type:
float | None
- property tau_lr: float¶
Return the learning rate for tau adaptation.
- Returns:
Scalar learning rate for tau updates.
- Return type:
- property tau_min: float¶
Return the minimum tau value.
- Returns:
Lower bound applied to tau.
- Return type:
- property tau_max: float¶
Return the maximum tau value.
- Returns:
Upper bound applied to tau.
- Return type:
- property tau_warmup_steps: int¶
Return the tau warmup horizon.
- Returns:
Number of steps used to warm up tau updates.
- Return type:
- property kl_target: float¶
Return the KL target.
- Returns:
Desired KL divergence target.
- Return type: