Overview¶

The canonical training path in this repository is now the upstream OAT README-flash stack plus a local listwise maxent-explorer overlay.

Active baseline launcher:

ops/run_oat_zero_exact_1p5b_upstream.sh
ops/slurm/train_understand_r1_zero_qwen2p5_math_1p5b_r1_readme_flash_node302.slurm

Active explorer launcher:

ops/run_oat_zero_explorer_1p5b_upstream.sh
ops/slurm/train_understand_r1_zero_qwen2p5_math_1p5b_r1_readme_flash_explorer_node302.slurm

Retired TRL/Hydra orchestration and older noncanonical launchers are archived under archive/trl/.

Canonical Runtime¶

The working runtime is the repo-local paper310 environment:

Validate it before training:

python tools/audit_oat_setup.py

sbatch ops/slurm/train_understand_r1_zero_qwen2p5_math_1p5b_r1_readme_flash_node302.slurm

sbatch ops/slurm/train_understand_r1_zero_qwen2p5_math_1p5b_r1_readme_flash_explorer_node302.slurm

bash ops/run_oat_zero_exact_1p5b_upstream.sh
bash ops/run_oat_zero_explorer_1p5b_upstream.sh

archive/trl/ops/ keeps retired orchestration wrappers and experiment launchers.
archive/trl/ops/slurm/ keeps retired Slurm entrypoints.
src/maxent_grpo/ remains in the repo for reference and historical work, but it is not the canonical training front door anymore.