Overview ======== The canonical training path in this repository is now the upstream OAT README-flash stack plus a local listwise maxent-explorer overlay. Active baseline launcher: - ``ops/run_oat_zero_exact_1p5b_upstream.sh`` - ``ops/slurm/train_understand_r1_zero_qwen2p5_math_1p5b_r1_readme_flash_node302.slurm`` Active explorer launcher: - ``ops/run_oat_zero_explorer_1p5b_upstream.sh`` - ``ops/slurm/train_understand_r1_zero_qwen2p5_math_1p5b_r1_readme_flash_explorer_node302.slurm`` Retired TRL/Hydra orchestration and older noncanonical launchers are archived under ``archive/trl/``. Canonical Runtime ================= The working runtime is the repo-local ``paper310`` environment: - ``python==3.10.20`` - ``torch==2.6.0`` - ``transformers==4.51.3`` - ``vllm==0.8.4`` - ``oat-llm==0.1.3.post1`` - ``deepspeed==0.16.8`` - ``flash-attn==2.7.4.post1`` via the launch-time overlay Validate it before training: .. code-block:: bash python tools/audit_oat_setup.py Quickstart ========== 1. Launch the canonical baseline: .. code-block:: bash sbatch ops/slurm/train_understand_r1_zero_qwen2p5_math_1p5b_r1_readme_flash_node302.slurm 2. Launch the listwise maxent-explorer variant on the same stack: .. code-block:: bash sbatch ops/slurm/train_understand_r1_zero_qwen2p5_math_1p5b_r1_readme_flash_explorer_node302.slurm 3. For local shell launches instead of Slurm: .. code-block:: bash bash ops/run_oat_zero_exact_1p5b_upstream.sh bash ops/run_oat_zero_explorer_1p5b_upstream.sh What Is Archived ================ - ``archive/trl/ops/`` keeps retired orchestration wrappers and experiment launchers. - ``archive/trl/ops/slurm/`` keeps retired Slurm entrypoints. - ``src/maxent_grpo/`` remains in the repo for reference and historical work, but it is not the canonical training front door anymore. Quick Links =========== - `OAT Upstream DR.GRPO `_ - exact working stack and explorer overlay. - `Training Guide `_ - canonical launch flow plus archive notes. - `Runtime `_ - pinned runtime and validation checks.