Runtime Setup¶
This page summarizes the canonical runtime for the active OAT training path.
Canonical OAT Runtime¶
The working environment is the repo-local paper310 runtime used by the
README-flash OAT launchers:
python==3.10.20torch==2.6.0transformers==4.51.3vllm==0.8.4oat-llm==0.1.3.post1deepspeed==0.16.8flash-attn==2.7.4.post1via the launch-time overlay
The canonical interpreter is:
var/seed_paper_eval/paper310/bin/python
Validate the runtime with:
python tools/audit_oat_setup.py
Why This Matters¶
The upstream OAT training path proved sensitive to version drift. This repository now keeps one canonical runtime for the active launchers so the baseline DR.GRPO path and the listwise explorer overlay share the same working stack.
Active Launchers¶
ops/run_oat_zero_exact_1p5b_upstream.shops/run_oat_zero_explorer_1p5b_upstream.shops/slurm/train_understand_r1_zero_qwen2p5_math_1p5b_r1_readme_flash_node302.slurmops/slurm/train_understand_r1_zero_qwen2p5_math_1p5b_r1_readme_flash_explorer_node302.slurm
Environment Notes¶
Flash attention is installed at launch time into a local overlay, rather than being assumed to exist globally.
The OAT launchers route caches and temporary files into
var/or node-local scratch instead of relying on ambient home-directory state.The explorer path reuses the same runtime and only switches the learner objective to
maxent_listwise.
Archived Runtime Surface¶
Older TRL/Hydra launchers and other retired training wrappers are kept under
archive/trl/. They are not part of the active runtime contract anymore.