maxent_grpo.training.controller_objective¶
Meta-controller objectives for tau/beta adaptation.
Functions
|
Return the configured controller objective for the current run. |
Classes
Closed-form gradients based on entropy/KL targets. |
|
|
Gradient bundle returned by controller objectives. |
|
Inputs made available to controller objectives. |
Base class for controller objectives. |
|
|
Truncated meta-gradient objective relying on a user-supplied callback. |
- class maxent_grpo.training.controller_objective.AnalyticControllerObjective[source]¶
Bases:
ControllerObjectiveClosed-form gradients based on entropy/KL targets.
- name = 'analytic'¶
- compute(meta_ctx)[source]¶
- Parameters:
meta_ctx (ControllerMetaContext)
- Return type:
ControllerGradients | None
- class maxent_grpo.training.controller_objective.ControllerGradients(tau_grad=None, beta_grad=None)[source]¶
Bases:
objectGradient bundle returned by controller objectives.
- class maxent_grpo.training.controller_objective.ControllerMetaContext(weighting, weight_stats, loss_outputs, global_step, lr_scale=1.0, prepared_batch=None, kl_value=None, backprop_fn=None)[source]¶
Bases:
objectInputs made available to controller objectives.
- Parameters:
weighting (WeightingSettings)
weight_stats (Any)
loss_outputs (Any)
global_step (int)
lr_scale (float)
prepared_batch (Any)
kl_value (float | None)
backprop_fn (Callable[[int], ControllerGradients | None] | None)
- weighting: WeightingSettings¶
- weight_stats: Any¶
- loss_outputs: Any¶
- prepared_batch: Any = None¶
- backprop_fn: Callable[[int], ControllerGradients | None] | None = None¶
- class maxent_grpo.training.controller_objective.ControllerObjective[source]¶
Bases:
objectBase class for controller objectives.
- name = 'base'¶
- compute(meta_ctx)[source]¶
- Parameters:
meta_ctx (ControllerMetaContext)
- Return type:
ControllerGradients | None
- class maxent_grpo.training.controller_objective.TruncatedBackpropControllerObjective(steps=1)[source]¶
Bases:
ControllerObjectiveTruncated meta-gradient objective relying on a user-supplied callback.
- Parameters:
steps (int)
- name = 'truncated_backprop'¶
- compute(meta_ctx)[source]¶
- Parameters:
meta_ctx (ControllerMetaContext)
- Return type:
ControllerGradients | None
- maxent_grpo.training.controller_objective.build_controller_objective(cfg, weighting)[source]¶
Return the configured controller objective for the current run.
- Parameters:
cfg (GRPOConfig) – Training configuration (retained for compatibility; not used).
weighting (WeightingSettings) – Weighting settings containing controller meta config.
- Returns:
Controller objective instance or
Nonewhen disabled.- Return type:
ControllerObjective | None