maxent_grpo.cli.config_validation

Pydantic-powered validation for Hydra training configs.

This module inspects the resolved training arguments before a pipeline is launched so accidental MaxEnt toggles are caught early. The validator is kept lightweight and only depends on pydantic, which is already part of the runtime toolchain for several other components. Future guardrails can extend this module by adding additional schema checks (including GRPO + entropy-bonus overrides under train-maxent).

Functions

_field_default(field)

_format_validation_errors(errors)

_integer_or_none(value)

_is_safe_grpo_maxent_override(name, value)

_maxent_overrides(values)

Return MaxEnt fields whose values differ from their defaults.

_normalize_entropy_mode(value)

Return the canonical entropy-mode label used by config validation.

_numeric_or_none(value)

_source_hint(command, *, recipe, training_args)

Return a short string pointing at the config origin for error messages.

_training_values(payload)

Return a mapping containing the knobs relevant to validation.

_validate_entropy_objective_settings(values)

Reject entropy-loss settings that do not match the implemented math.

_validate_listwise_microbatch_shape(values)

_validate_removed_training_keys(values)

_validate_seed_grpo_settings(values)

Reject SEED-GRPO knobs that are incompatible with the selected objective.

validate_training_config(training_args, *, ...)

Validate Hydrated training knobs before dispatching to a pipeline.

Classes

_TrainingSchema(**kwargs)

Minimal schema capturing the knobs that need cross-field validation.

maxent_grpo.cli.config_validation.validate_training_config(training_args, *, command, source=None)[source]

Validate Hydrated training knobs before dispatching to a pipeline.

The validator ensures that the canonical objective matches the presence of MaxEnt-specific options. When MaxEnt knobs are supplied while the effective objective stays on the native GRPO path, a ValueError is raised so the job fails fast.

Parameters:
  • training_args (GRPOConfig | Mapping[str, Any]) – Training dataclass or mapping derived from Hydra.

  • command (str) – CLI command being executed (e.g., train-baseline).

  • source (str | None) – Optional user-facing hint (recipe path, override description).

Returns:

None. Raises on invalid or incompatible configurations.

Raises:

ValueError – If incompatible knob combinations are detected.

Return type:

None