maxent_grpo.training.state¶
Training loop state helpers for controller and checkpoint management.
Functions
|
|
|
|
|
Return True when a checkpoint directory looks loadable via |
|
Return True when a checkpoint looks like a DeepSpeed engine checkpoint. |
|
|
|
Return True when a checkpoint directory contains loadable, non-empty HF weights. |
|
Return True when a checkpoint directory contains ZeRO shard files. |
|
Best-effort discovery of the latest checkpoint under |
|
|
|
Load controller parameters from |
|
Attempt to convert ZeRO shards into a consolidated HF weight file. |
|
Promote DeepSpeed tag subfolders (e.g., |
|
Return the numeric suffix from a |
|
Normalize |
|
Delete checkpoints to respect |
|
Return the DeepSpeed/Accelerate checkpoint tag stored in |
|
|
|
Return True when a safetensors file declares non-empty tensors. |
|
|
|
|
|
Persist a minimal trainer_state.json so future resumes find the step. |
|
Return a save_checkpoint callable compatible with LoggingHandles. |
|
Construct minimal logging handles for the custom runner. |
|
Set stop flag when the configured number of steps is reached. |
|
Attempt to load controller state from resume directory or the current state. |
|
Load trainer_state.json if available for resume bookkeeping. |
|
Checkpoint periodically while on the main process. |
Delete a stale controller state file when overwriting the output dir. |
|
Load an accelerator state directory when resuming if available. |
|
|
Resolve the checkpoint path to resume from, if any. |
Classes
|
Subset of Accelerator API used by training state utilities. |
|
Minimal controller path settings used by checkpoint helpers. |
- maxent_grpo.training.state.maybe_clear_stale_controller_state(accelerator, controller_cfg)[source]¶
Delete a stale controller state file when overwriting the output dir.
- Parameters:
accelerator (AcceleratorLike) – Accelerate handle used to determine the main process and trigger
wait_for_everyoneguards.controller_cfg (ControllerPathsLike) – Paths describing the active controller checkpoint/restore locations.
- Return type:
None
- maxent_grpo.training.state.load_controller_state_chain(controller_cfg, accelerator, weighting_cfg)[source]¶
Attempt to load controller state from resume directory or the current state.
- Parameters:
controller_cfg (ControllerPathsLike) – Filesystem paths for controller checkpoints.
accelerator (AcceleratorLike) – Accelerate handle performing logging/synchronization.
weighting_cfg (WeightingConfigLike) – Mutable weighting settings that receive the loaded parameters.
- Returns:
Truewhen controller resume was requested or a controller checkpoint was successfully loaded.- Return type:
- maxent_grpo.training.state.resolve_resume_checkpoint(training_args)[source]¶
Resolve the checkpoint path to resume from, if any.
- maxent_grpo.training.state.load_trainer_state_metadata(checkpoint_path)[source]¶
Load trainer_state.json if available for resume bookkeeping.
- maxent_grpo.training.state.maybe_load_accelerator_state(resume_state_path, accelerator)[source]¶
Load an accelerator state directory when resuming if available.
- Parameters:
resume_state_path (str | None) – Filesystem path to an accelerator state directory (e.g., saved by
accelerator.save_state).accelerator (AcceleratorLike) – Accelerate handle whose
load_statemethod will be invoked.
- Returns:
None.- Return type:
None
- maxent_grpo.training.state.maybe_checkpoint(logging_cfg, accelerator, global_step)[source]¶
Checkpoint periodically while on the main process.
- Parameters:
logging_cfg (LoggingHandles) – Logging handles containing checkpoint callbacks and scheduling knobs.
accelerator (AcceleratorLike) – Accelerate handle used for synchronization and main-process checks.
global_step (int) – Current optimizer step; used to decide whether
save_stepsdivides the step index evenly.
- Returns:
None.- Return type:
None
- maxent_grpo.training.state.check_stop_condition(schedule, loop_state)[source]¶
Set stop flag when the configured number of steps is reached.
- Parameters:
schedule (training.types.OptimizationSchedule) – Optimization schedule describing
total_training_steps.loop_state (training.types.TrainingLoopState) – Mutable training loop state whose
stop_trainingflag should be updated when the threshold is crossed.
- Returns:
None.- Return type:
None
- maxent_grpo.training.state.build_checkpoint_saver(training_args, runtime_handles, optim_handles, tokenizer, *, state_ref=None, base_trainer_state=None, controller_cfg=None)[source]¶
Return a save_checkpoint callable compatible with LoggingHandles.
The returned callable snapshots accelerator state, model/optimizer weights, trainer state metadata, and optional controller state into a checkpoint directory under
output_dir.- Parameters:
training_args (object) – Training configuration containing output/checkpoint options.
runtime_handles (object) – Runtime bundle providing model/accelerator references.
optim_handles (object) – Optimizer bundle used for saving optimizer state.
tokenizer (object) – Tokenizer to serialize alongside checkpoints.
state_ref (Dict[str, object] | None) – Mutable state dict used for cross-callback coordination.
base_trainer_state (Dict[str, object] | None) – Optional base trainer state JSON to merge into saves.
controller_cfg (ControllerPathsLike | None) – Optional controller state paths for MaxEnt.
- Returns:
Callable
save_checkpoint(name: str) -> None.- Return type:
Callable[[str], None]