maxent_grpo.grpo¶
Baseline GRPO training entrypoint.
Provides a thin wrapper around the training pipeline that either parses TRL
arguments from the CLI or delegates to the Hydra-based CLI when explicit args
are not provided. Exposed for python -m maxent_grpo.grpo and for
programmatic invocation inside orchestration code.
Functions
|
Best-effort import helper for optional CLI attributes. |
|
Invoke the baseline entrypoint (CLI style). |
|
Run the baseline GRPO trainer or delegate to Hydra. |
- maxent_grpo.grpo.cli()[source]¶
Invoke the baseline entrypoint (CLI style).
- Returns:
None. Side effects include running training or delegating to Hydra.- Return type:
None
- maxent_grpo.grpo.main(script_args=None, training_args=None, model_args=None)[source]¶
Run the baseline GRPO trainer or delegate to Hydra.
- Parameters:
script_args (Optional[GRPOScriptArguments]) – Dataset/reward script arguments parsed via TRL or provided directly.
training_args (Optional[GRPOConfig]) – GRPO training configuration produced by TRL.
model_args (Optional[ModelConfig]) – Model configuration passed to TRL/transformers trainers.
- Returns:
Training result from
maxent_grpo.training.baseline.run_baseline_training(), or the Hydra CLI invocation result when no args are supplied.- Raises:
RuntimeError – If no CLI parser or Hydra entrypoint is available.
Exception – Propagates parser or training pipeline exceptions.
- Return type:
Any