maxent_grpo.training.eval

Validation helpers for the MaxEnt-GRPO training loop.

Functions

_as_tensor_1d(value[, device])

Best-effort conversion to a 1D float tensor.

_build_eval_shard(evaluation_rows, accelerator)

Return shard metadata describing which rows this rank evaluates.

_compute_eval_kl_tensor(cur_logp_sum, ...)

_compute_eval_rewards(completions, answers, ...)

Return aggregated reward scores for completions.

_compute_seed_eval_metrics(ctx)

Compute auxiliary seed-eval metrics from multi-sample generations.

_deepspeed_zero_stage(accelerator)

Return DeepSpeed ZeRO stage from Accelerate plugin state when present.

_env_flag(name[, default])

_eval_logprobs_enabled()

_eval_rank_tag()

_flatten_eval_meta(grouped_meta, expected_len)

_gather_eval_stats(accelerator, eval_scores)

Gather mean reward statistics across all ranks.

_init_eval_score_stats()

_iter_eval_batches(evaluation_rows, batch_size)

Yield prompt/answer lists for evaluation rows.

_log_eval_start(step, shard, batch_size)

Log the evaluation plan when running on the main rank.

_match_tensor_length(tensor, target_len[, ...])

Pad or truncate a 1D tensor to target_len.

_maybe_presync_vllm_for_eval(ctx, ...)

Synchronize vLLM weights across all ranks before rank-0-only eval.

_progress_log_enabled()

_run_eval_batches(shard, batch_size, ctx, step)

Generate completions for the shard rows and log periodic progress.

_score_eval_batch(ctx, prompts, completions, ...)

_seed_eval_config(evaluation_cfg)

Normalize optional seed-eval settings from EvaluationSettings.

_seed_eval_enabled(raw)

Return whether seed-eval metrics are enabled from a config value.

_tally_format_issues(completions)

Return format issue counts for a batch of completions.

_update_eval_score_stats(target, update)

_warn_eval_logprobs_unavailable(reason)

run_validation_step(step, ctx)

Generate single completions on the eval set and log mean reward.

Classes

_EvalShardInfo(rows, total_rows, ...)

Metadata describing the evaluation shard for the current rank.

maxent_grpo.training.eval.run_validation_step(step, ctx)[source]

Generate single completions on the eval set and log mean reward.

Parameters:
  • step (int) – Training step identifier passed to logging hooks.

  • ctx (ValidationContext) – Validation context providing evaluation rows and handles.

Returns:

None. Logs metrics through the provided handles.

Return type:

None