maxent_grpo.training.controller_objective

Meta-controller objectives for tau/beta adaptation.

Functions

build_controller_objective(cfg, weighting)

Return the configured controller objective for the current run.

Classes

AnalyticControllerObjective()

Closed-form gradients based on entropy/KL targets.

ControllerGradients([tau_grad, beta_grad])

Gradient bundle returned by controller objectives.

ControllerMetaContext(weighting, ...[, ...])

Inputs made available to controller objectives.

ControllerObjective()

Base class for controller objectives.

TruncatedBackpropControllerObjective([steps])

Truncated meta-gradient objective relying on a user-supplied callback.

class maxent_grpo.training.controller_objective.AnalyticControllerObjective[source]

Bases: ControllerObjective

Closed-form gradients based on entropy/KL targets.

name = 'analytic'
compute(meta_ctx)[source]
Parameters:

meta_ctx (ControllerMetaContext)

Return type:

ControllerGradients | None

class maxent_grpo.training.controller_objective.ControllerGradients(tau_grad=None, beta_grad=None)[source]

Bases: object

Gradient bundle returned by controller objectives.

Parameters:
  • tau_grad (float | None)

  • beta_grad (float | None)

tau_grad: float | None = None
beta_grad: float | None = None
has_updates()[source]
Return type:

bool

class maxent_grpo.training.controller_objective.ControllerMetaContext(weighting, weight_stats, loss_outputs, global_step, lr_scale=1.0, prepared_batch=None, kl_value=None, backprop_fn=None)[source]

Bases: object

Inputs made available to controller objectives.

Parameters:
weighting: WeightingSettings
weight_stats: Any
loss_outputs: Any
global_step: int
lr_scale: float = 1.0
prepared_batch: Any = None
kl_value: float | None = None
backprop_fn: Callable[[int], ControllerGradients | None] | None = None
entropy_value()[source]

Return the batch entropy used for tau updates (handles logging views).

Return type:

float | None

kl_metric()[source]

Return the KL metric supplied by the loss or fallback to cached value.

Return type:

float | None

class maxent_grpo.training.controller_objective.ControllerObjective[source]

Bases: object

Base class for controller objectives.

name = 'base'
compute(meta_ctx)[source]
Parameters:

meta_ctx (ControllerMetaContext)

Return type:

ControllerGradients | None

class maxent_grpo.training.controller_objective.TruncatedBackpropControllerObjective(steps=1)[source]

Bases: ControllerObjective

Truncated meta-gradient objective relying on a user-supplied callback.

Parameters:

steps (int)

name = 'truncated_backprop'
compute(meta_ctx)[source]
Parameters:

meta_ctx (ControllerMetaContext)

Return type:

ControllerGradients | None

maxent_grpo.training.controller_objective.build_controller_objective(cfg, weighting)[source]

Return the configured controller objective for the current run.

Parameters:
  • cfg (GRPOConfig) – Training configuration (retained for compatibility; not used).

  • weighting (WeightingSettings) – Weighting settings containing controller meta config.

Returns:

Controller objective instance or None when disabled.

Return type:

ControllerObjective | None