Method Identity

This repo now treats method selection as two explicit axes instead of one overloaded knob:

  • Algorithm family: controlled by training.objective plus training.seed_grpo_enabled.

  • Loss backend: controlled by training.grpo_loss_type.

That separation matters because objective: grpo does not tell you whether a run is plain GRPO, BNPO-style GRPO, Dr.GRPO, or SEED-GRPO on top of Dr.GRPO.

Current 1.5B Math Presets

Preset

Family

objective

seed_grpo_enabled

grpo_loss_type

Canonical label

configs/recipes/hydra/grpo_custom_math.yaml

Baseline GRPO

grpo

false

dr_grpo

Dr.GRPO

configs/recipes/hydra/maxent_entropy_math.yaml

Entropy MaxEnt

maxent_entropy

false

dr_grpo

Entropy MaxEnt (Dr.GRPO loss)

configs/recipes/hydra/maxent_listwise_math.yaml

Listwise MaxEnt

maxent_listwise

false

dr_grpo

Listwise MaxEnt (Dr.GRPO loss)

configs/recipes/hydra/seed_grpo_math.yaml

SEED-GRPO

grpo

true

dr_grpo

SEED-GRPO (Dr.GRPO loss)

The filename grpo_custom_math.yaml is historical. In the current 1.5B math setup, it is the baseline Dr.GRPO preset because it pins grpo_loss_type: dr_grpo.

Source of Truth

  • src/maxent_grpo/objectives.py: normalizes the top-level objective family.

  • src/maxent_grpo/methods.py: resolves the final method identity from family + backend.

  • src/maxent_grpo/config/grpo.py: validates and normalizes grpo_loss_type and the family-selection flags.

  • src/maxent_grpo/training/trl_trainer.py: logs the resolved method at trainer startup.

  • src/maxent_grpo/training/runtime/logging.py: writes run/method_name, run/method_family, run/method_backend, and run/method_slug into run metadata/W&B config.

Family-Specific Code

  • Baseline GRPO / Dr.GRPO backend: src/maxent_grpo/training/trl_trainer.py

  • SEED-GRPO advantage scaling: src/maxent_grpo/training/rewards.py

  • Entropy MaxEnt objective: src/maxent_grpo/training/trl_trainer.py

  • Listwise MaxEnt objective + tau/q/beta weighting: src/maxent_grpo/training/trl_trainer.py and src/maxent_grpo/training/weighting/logic.py