maxent_grpo.grpo

Baseline GRPO training entrypoint.

Provides a thin wrapper around the training pipeline that either parses TRL arguments from the CLI or delegates to the Hydra-based CLI when explicit args are not provided. Exposed for python -m maxent_grpo.grpo and for programmatic invocation inside orchestration code.

Functions

_resolve_cli_attr(attr_name)

Best-effort import helper for optional CLI attributes.

cli()

Invoke the baseline entrypoint (CLI style).

main([script_args, training_args, model_args])

Run the baseline GRPO trainer or delegate to Hydra.

maxent_grpo.grpo.cli()[source]

Invoke the baseline entrypoint (CLI style).

Returns:

None. Side effects include running training or delegating to Hydra.

Return type:

None

maxent_grpo.grpo.main(script_args=None, training_args=None, model_args=None)[source]

Run the baseline GRPO trainer or delegate to Hydra.

Parameters:
  • script_args (Optional[GRPOScriptArguments]) – Dataset/reward script arguments parsed via TRL or provided directly.

  • training_args (Optional[GRPOConfig]) – GRPO training configuration produced by TRL.

  • model_args (Optional[ModelConfig]) – Model configuration passed to TRL/transformers trainers.

Returns:

Training result from maxent_grpo.training.baseline.run_baseline_training(), or the Hydra CLI invocation result when no args are supplied.

Raises:
  • RuntimeError – If no CLI parser or Hydra entrypoint is available.

  • Exception – Propagates parser or training pipeline exceptions.

Return type:

Any