maxent_grpo.grpo¶

Baseline GRPO training entrypoint.

Provides a thin wrapper around the training pipeline that either parses TRL arguments from the CLI or delegates to the Hydra-based CLI when explicit args are not provided. Exposed for python -m maxent_grpo.grpo and for programmatic invocation inside orchestration code.

Functions

`_resolve_cli_attr`(attr_name)	Best-effort import helper for optional CLI attributes.
`cli`()	Invoke the baseline entrypoint (CLI style).
`main`([script_args, training_args, model_args])	Run the baseline GRPO trainer or delegate to Hydra.

maxent_grpo.grpo.cli()[source]¶

Invoke the baseline entrypoint (CLI style).

Returns:: None. Side effects include running training or delegating to Hydra.
Return type:: None

maxent_grpo.grpo.main(script_args=None, training_args=None, model_args=None)[source]¶

Run the baseline GRPO trainer or delegate to Hydra.

Parameters:

script_args (Optional[GRPOScriptArguments]) – Dataset/reward script arguments parsed via TRL or provided directly.
training_args (Optional[GRPOConfig]) – GRPO training configuration produced by TRL.
model_args (Optional[ModelConfig]) – Model configuration passed to TRL/transformers trainers.

Returns:

Training result from maxent_grpo.training.baseline.run_baseline_training(), or the Hydra CLI invocation result when no args are supplied.

Raises:

RuntimeError – If no CLI parser or Hydra entrypoint is available.
Exception – Propagates parser or training pipeline exceptions.

Return type:

Any