maxent_grpo.training.weighting.types¶

Weighting-related dataclasses shared across the MaxEnt training loop.

Classes

`ControllerMetaSettings`([enabled, method, ...])	Meta-controller knobs governing tau/beta adaptation.
`ControllerStateSnapshot`(beta, tau, tau_log, ...)	Serializable controller state describing tau/beta parameters.
`KlControllerSettings`(target, horizon, step_size)	Controller settings for KL regularization.
`QDistributionSettings`(temperature, epsilon)	Softmax temperature and smoothing for weighting.
`TauSchedule`(target_entropy, learning_rate, ...)	Hyperparameters controlling tau adaptation.
`TorchControllerState`(torch_mod, tau_init, ...)	Torch-backed parameters for tau/beta with sync helpers.
`WeightLoggingView`([entropy, entropy_norm, ...])	Aggregated entropy statistics for logging.
`WeightNormalizationSettings`(denom, len_norm_ref)	Length-normalization flag and denominator scaling.
`WeightStats`(weights_grouped, flat_weights, ...)	Weights per completion and entropy diagnostics.
`WeightingConfigLike`(args, *kwargs)	Protocol for objects that carry controller weighting scalars.
`WeightingSettings`(tau, beta, normalization, ...)	Sequence weighting hyperparameters with convenience accessors.

class maxent_grpo.training.weighting.types.TorchControllerState(torch_mod, tau_init, beta_init, *, requires_grad=False)[source]¶

Bases: object

Torch-backed parameters for tau/beta with sync helpers.

Parameters:

torch_mod (Any)
tau_init (float)
beta_init (float)
requires_grad (bool)

enable_grad()[source]¶

Return type:: None

disable_grad()[source]¶

Return type:: None

sync_from_scalars(tau, beta)[source]¶

Parameters:

tau (float)
beta (float)

Return type:

None

tau_tensor(detach=False)[source]¶

Parameters:: detach (bool)
Return type:: Any

beta_tensor(detach=False)[source]¶

Parameters:: detach (bool)
Return type:: Any

parameters()[source]¶

Return type:: List[Any]

zero_grad()[source]¶

Return type:: None

class maxent_grpo.training.weighting.types.WeightingConfigLike(*args, **kwargs)[source]¶

Bases: Protocol

Protocol for objects that carry controller weighting scalars.

beta and tau are required. Optional attributes such as denom or train_grpo_objective may be present and are accessed via getattr.

beta: float¶

tau: float¶

class maxent_grpo.training.weighting.types.ControllerMetaSettings(enabled=False, method='analytic', learning_rate=0.0, tau_learning_rate=0.0, beta_learning_rate=0.0, beta_grad_clip=0.0, update_interval=1, objective='potential', analytic_steps=1, optimizer='sgd', truncation_steps=1, use_hessian=False, last_tau_grad=0.0, last_beta_grad=0.0)[source]¶

Bases: object

Meta-controller knobs governing tau/beta adaptation.

Parameters:

enabled (bool)
method (str)
learning_rate (float)
tau_learning_rate (float)
beta_learning_rate (float)
beta_grad_clip (float)
update_interval (int)
objective (str)
analytic_steps (int)
optimizer (str)
truncation_steps (int)
use_hessian (bool)
last_tau_grad (float)
last_beta_grad (float)

enabled: bool = False¶

method: str = 'analytic'¶

learning_rate: float = 0.0¶

tau_learning_rate: float = 0.0¶

beta_learning_rate: float = 0.0¶

beta_grad_clip: float = 0.0¶

update_interval: int = 1¶

objective: str = 'potential'¶

analytic_steps: int = 1¶

optimizer: str = 'sgd'¶

truncation_steps: int = 1¶

use_hessian: bool = False¶

last_tau_grad: float = 0.0¶

last_beta_grad: float = 0.0¶

to_state()[source]¶

Return a serializable snapshot of the meta-controller settings.

Return type:: Dict[str, Any]

apply_state(payload)[source]¶

Update the meta-controller settings from a serialized payload.

Parameters:: payload (Mapping[str, Any])
Return type:: None

class maxent_grpo.training.weighting.types.ControllerStateSnapshot(beta, tau, tau_log, tau_entropy_ema, meta=<factory>)[source]¶

Bases: object

Serializable controller state describing tau/beta parameters.

Parameters:

beta (float)
tau (float)
tau_log (float)
tau_entropy_ema (float)
meta (Dict[str, Any])

beta: float¶

tau: float¶

tau_log: float¶

tau_entropy_ema: float¶

meta: Dict[str, Any]¶

STATE_VERSION: ClassVar[int] = 1¶

to_dict()[source]¶

Serialize the snapshot to a JSON-friendly mapping.

Return type:: Dict[str, Any]

classmethod from_weighting(weighting_cfg)[source]¶

Build a controller snapshot from the active weighting settings.

Parameters:: weighting_cfg (WeightingConfigLike)
Return type:: ControllerStateSnapshot

classmethod from_dict(payload)[source]¶

Instantiate a snapshot from a serialized payload.

Parameters:: payload (Mapping[str, Any])
Return type:: ControllerStateSnapshot

apply_to_weighting(weighting_cfg)[source]¶

Apply the snapshot contents to a weighting configuration.

Parameters:: weighting_cfg (WeightingConfigLike)
Return type:: None

class maxent_grpo.training.weighting.types.KlControllerSettings(target, horizon, step_size)[source]¶

Bases: object

Controller settings for KL regularization.

Parameters:

target (float)
horizon (int)
step_size (float)

target: float¶

horizon: int¶

step_size: float¶

class maxent_grpo.training.weighting.types.QDistributionSettings(temperature, epsilon)[source]¶

Bases: object

Softmax temperature and smoothing for weighting.

Parameters:

temperature (float)
epsilon (float)

temperature: float¶

epsilon: float¶

class maxent_grpo.training.weighting.types.TauSchedule(target_entropy, learning_rate, minimum_value, maximum_value, warmup_steps, target_entropy_start=None, target_entropy_final=None, target_entropy_horizon=0)[source]¶

Bases: object

Hyperparameters controlling tau adaptation.

Parameters:

target_entropy (float | None)
learning_rate (float)
minimum_value (float)
maximum_value (float)
warmup_steps (int)
target_entropy_start (float | None)
target_entropy_final (float | None)
target_entropy_horizon (int)

target_entropy: float | None¶

learning_rate: float¶

minimum_value: float¶

maximum_value: float¶

warmup_steps: int¶

target_entropy_start: float | None = None¶

target_entropy_final: float | None = None¶

target_entropy_horizon: int = 0¶

class maxent_grpo.training.weighting.types.WeightLoggingView(entropy=0.0, entropy_norm=0.0, entropy_min=0.0, entropy_max=0.0, advantage_entropy_mean=0.0, advantage_entropy_std=0.0)[source]¶

Bases: object

Aggregated entropy statistics for logging.

Parameters:

entropy (float)
entropy_norm (float)
entropy_min (float)
entropy_max (float)
advantage_entropy_mean (float)
advantage_entropy_std (float)

entropy: float = 0.0¶

entropy_norm: float = 0.0¶

entropy_min: float = 0.0¶

entropy_max: float = 0.0¶

advantage_entropy_mean: float = 0.0¶

advantage_entropy_std: float = 0.0¶

class maxent_grpo.training.weighting.types.WeightNormalizationSettings(denom, len_norm_ref)[source]¶

Bases: object

Length-normalization flag and denominator scaling.

Parameters:

denom (float)
len_norm_ref (bool)

denom: float¶

len_norm_ref: bool¶

class maxent_grpo.training.weighting.types.WeightStats(weights_grouped, flat_weights, weight_entropy, weight_entropy_min, weight_entropy_max, advantage_entropy)[source]¶

Bases: object

Weights per completion and entropy diagnostics.

Parameters:

weights_grouped (List[List[float]])
flat_weights (List[float])
weight_entropy (float)
weight_entropy_min (float)
weight_entropy_max (float)
advantage_entropy (List[float])

weights_grouped: List[List[float]]¶

flat_weights: List[float]¶

weight_entropy: float¶

weight_entropy_min: float¶

weight_entropy_max: float¶

advantage_entropy: List[float]¶

class maxent_grpo.training.weighting.types.WeightingSettings(tau, beta, normalization, q_distribution, tau_schedule, kl_controller, train_grpo_objective, scale_rewards=True, controller_meta=<factory>, controller_state=None, allow_empty_weight_fallback=False)[source]¶

Bases: object

Sequence weighting hyperparameters with convenience accessors.

Parameters:

tau (float)
beta (float)
normalization (WeightNormalizationSettings)
q_distribution (QDistributionSettings)
tau_schedule (TauSchedule)
kl_controller (KlControllerSettings)
train_grpo_objective (bool)
scale_rewards (bool)
controller_meta (ControllerMetaSettings)
controller_state (TorchControllerState | None)
allow_empty_weight_fallback (bool)

tau: float¶

beta: float¶

normalization: WeightNormalizationSettings¶

q_distribution: QDistributionSettings¶

tau_schedule: TauSchedule¶

kl_controller: KlControllerSettings¶

train_grpo_objective: bool¶

scale_rewards: bool = True¶

controller_meta: ControllerMetaSettings¶

controller_state: TorchControllerState | None = None¶

allow_empty_weight_fallback: bool = False¶

property denom: float¶

Return the denominator used for weight normalization.

Returns:: Normalization denominator applied to weights.
Return type:: float

property len_norm_ref: bool¶

Return whether reference log-probs are length-normalized.

Returns:: True when reference stats are length-normalized.
Return type:: bool

property q_temperature: float¶

Return the q-distribution temperature.

Returns:: Temperature applied to the q-distribution softmax.
Return type:: float

property q_epsilon: float¶

Return the epsilon smoothing factor.

Returns:: Epsilon smoothing applied to the q-distribution.
Return type:: float

property tau_target_entropy: float | None¶

Return the target weight entropy.

Returns:: Desired entropy target (None to disable adaptation).
Return type:: float | None

property tau_lr: float¶

Return the learning rate for tau adaptation.

Returns:: Scalar learning rate for tau updates.
Return type:: float

property tau_min: float¶

Return the minimum tau value.

Returns:: Lower bound applied to tau.
Return type:: float

property tau_max: float¶

Return the maximum tau value.

Returns:: Upper bound applied to tau.
Return type:: float

property tau_warmup_steps: int¶

Return the tau warmup horizon.

Returns:: Number of steps used to warm up tau updates.
Return type:: int

property kl_target: float¶

Return the KL target.

Returns:: Desired KL divergence target.
Return type:: float

property kl_horizon: int¶

Return the KL controller horizon.

Returns:: Number of steps used for the KL controller horizon.
Return type:: int

property kl_ctl_step_size: float¶

Return the KL controller step size.

Returns:: Step size multiplier used by the KL controller.
Return type:: float