Overview¶
The canonical training path in this repository is now the upstream OAT README-flash stack plus a local listwise maxent-explorer overlay.
Active baseline launcher:
ops/run_oat_zero_exact_1p5b_upstream.shops/slurm/train_understand_r1_zero_qwen2p5_math_1p5b_r1_readme_flash_node302.slurm
Active explorer launcher:
ops/run_oat_zero_explorer_1p5b_upstream.shops/slurm/train_understand_r1_zero_qwen2p5_math_1p5b_r1_readme_flash_explorer_node302.slurm
Retired TRL/Hydra orchestration and older noncanonical launchers are archived
under archive/trl/.
Canonical Runtime¶
The working runtime is the repo-local paper310 environment:
python==3.10.20torch==2.6.0transformers==4.51.3vllm==0.8.4oat-llm==0.1.3.post1deepspeed==0.16.8flash-attn==2.7.4.post1via the launch-time overlay
Validate it before training:
python tools/audit_oat_setup.py
Quickstart¶
Launch the canonical baseline:
sbatch ops/slurm/train_understand_r1_zero_qwen2p5_math_1p5b_r1_readme_flash_node302.slurm
Launch the listwise maxent-explorer variant on the same stack:
sbatch ops/slurm/train_understand_r1_zero_qwen2p5_math_1p5b_r1_readme_flash_explorer_node302.slurm
For local shell launches instead of Slurm:
bash ops/run_oat_zero_exact_1p5b_upstream.sh
bash ops/run_oat_zero_explorer_1p5b_upstream.sh
What Is Archived¶
archive/trl/ops/keeps retired orchestration wrappers and experiment launchers.archive/trl/ops/slurm/keeps retired Slurm entrypoints.src/maxent_grpo/remains in the repo for reference and historical work, but it is not the canonical training front door anymore.
Quick Links¶
OAT Upstream DR.GRPO - exact working stack and explorer overlay.
Training Guide - canonical launch flow plus archive notes.
Runtime - pinned runtime and validation checks.