maxent_grpo.training.eval¶

Validation helpers for the MaxEnt-GRPO training loop.

Functions

`_as_tensor_1d`(value[, device])	Best-effort conversion to a 1D float tensor.
`_build_eval_shard`(evaluation_rows, accelerator)	Return shard metadata describing which rows this rank evaluates.
`_compute_eval_kl_tensor`(cur_logp_sum, ...)
`_compute_eval_rewards`(completions, answers, ...)	Return aggregated reward scores for completions.
`_compute_seed_eval_metrics`(ctx)	Compute auxiliary seed-eval metrics from multi-sample generations.
`_deepspeed_zero_stage`(accelerator)	Return DeepSpeed ZeRO stage from Accelerate plugin state when present.
`_env_flag`(name[, default])
`_eval_logprobs_enabled`()
`_eval_rank_tag`()
`_flatten_eval_meta`(grouped_meta, expected_len)
`_gather_eval_stats`(accelerator, eval_scores)	Gather mean reward statistics across all ranks.
`_init_eval_score_stats`()
`_iter_eval_batches`(evaluation_rows, batch_size)	Yield prompt/answer lists for evaluation rows.
`_log_eval_start`(step, shard, batch_size)	Log the evaluation plan when running on the main rank.
`_match_tensor_length`(tensor, target_len[, ...])	Pad or truncate a 1D tensor to `target_len`.
`_maybe_presync_vllm_for_eval`(ctx, ...)	Synchronize vLLM weights across all ranks before rank-0-only eval.
`_progress_log_enabled`()
`_run_eval_batches`(shard, batch_size, ctx, step)	Generate completions for the shard rows and log periodic progress.
`_score_eval_batch`(ctx, prompts, completions, ...)
`_seed_eval_config`(evaluation_cfg)	Normalize optional seed-eval settings from `EvaluationSettings`.
`_seed_eval_enabled`(raw)	Return whether seed-eval metrics are enabled from a config value.
`_tally_format_issues`(completions)	Return format issue counts for a batch of completions.
`_update_eval_score_stats`(target, update)
`_warn_eval_logprobs_unavailable`(reason)
`run_validation_step`(step, ctx)	Generate single completions on the eval set and log mean reward.

Classes

_EvalShardInfo(rows, total_rows, ...)

Metadata describing the evaluation shard for the current rank.

maxent_grpo.training.eval.run_validation_step(step, ctx)[source]¶

Generate single completions on the eval set and log mean reward.

Parameters:

step (int) – Training step identifier passed to logging hooks.
ctx (ValidationContext) – Validation context providing evaluation rows and handles.

Returns:

None. Logs metrics through the provided handles.

Return type:

None