maxent_grpo.training.eval¶
Validation helpers for the MaxEnt-GRPO training loop.
Functions
|
Best-effort conversion to a 1D float tensor. |
|
Return shard metadata describing which rows this rank evaluates. |
|
|
|
Return aggregated reward scores for completions. |
|
Compute auxiliary seed-eval metrics from multi-sample generations. |
|
Return DeepSpeed ZeRO stage from Accelerate plugin state when present. |
|
|
|
|
|
|
|
|
|
Gather mean reward statistics across all ranks. |
|
|
|
Yield prompt/answer lists for evaluation rows. |
|
Log the evaluation plan when running on the main rank. |
|
Pad or truncate a 1D tensor to |
|
Synchronize vLLM weights across all ranks before rank-0-only eval. |
|
|
|
Generate completions for the shard rows and log periodic progress. |
|
|
|
Normalize optional seed-eval settings from |
|
Return whether seed-eval metrics are enabled from a config value. |
|
Return format issue counts for a batch of completions. |
|
|
|
|
|
Generate single completions on the eval set and log mean reward. |
Classes
|
Metadata describing the evaluation shard for the current rank. |
- maxent_grpo.training.eval.run_validation_step(step, ctx)[source]¶
Generate single completions on the eval set and log mean reward.
- Parameters:
step (int) – Training step identifier passed to logging hooks.
ctx (ValidationContext) – Validation context providing evaluation rows and handles.
- Returns:
None. Logs metrics through the provided handles.
- Return type:
None