maxent\_grpo.training ===================== .. automodule:: maxent_grpo.training .. rubric:: Modules .. autosummary:: :toctree: :recursive: baseline cli controller_objective controller_optimizer data eval generation metrics optim patches pipeline rewards rollout run_helpers runtime scoring scoring_batching scoring_common scoring_logprob scoring_reference seed_paper_eval_callback state telemetry trainer_hooks trl_trainer types weighting zero_utils