maxent_grpo.core.hub¶
Helpers for working with the Hugging Face Hub.
This module provides:
Upload utilities to push a training output directory to a dedicated branch (revision) with basic safety checks.
Small metadata helpers such as parameter count inference from a repo ID (via naming conventions or safetensors metadata) and choosing a valid GPU count for vLLM tensor parallelism.
Functions
|
Validate whether a target Hub revision exists and is safe to write. |
|
Verify Hub credentials and provision the target repo/branch upfront. |
|
Choose a valid GPU count for vLLM tensor parallelism. |
|
Infer parameter count from naming conventions or Hub metadata. |
|
Push a checkpoint directory to a branch on the Hub. |
- maxent_grpo.core.hub.push_to_hub_revision(training_args, extra_ignore_patterns=None, *, include_checkpoints=False)[source]¶
Push a checkpoint directory to a branch on the Hub.
The helper will create the repository if missing, ensure the target branch exists (forked from the latest commit when possible), and upload the
output_dircontents while ignoring common checkpoint artefacts. Uploads are executed asynchronously viarun_as_future=Trueto avoid blocking training scripts.- Parameters:
training_args (GRPOConfig) – Training config with Hub identifiers (
hub_model_idandhub_model_revision) and the localoutput_dirto upload.include_checkpoints (bool) – When True, do not ignore checkpoint-* folders.
extra_ignore_patterns (list[str] | None) – Additional filename patterns to ignore during upload; appended to the default
checkpoint-*and*.pthfilters.
- Returns:
Future that completes when the upload finishes, resolving to the Hub commit metadata.
- Return type:
Future[huggingface_hub.CommitInfo]
- Raises:
ValueError – If
hub_model_idis not set intraining_args.
- maxent_grpo.core.hub.ensure_hf_repo_ready(training_args)[source]¶
Verify Hub credentials and provision the target repo/branch upfront.
The helper is a best-effort preflight. When Hub access is not configured (or push is disabled), it returns early. Errors in network calls are surfaced as
RuntimeErrorto avoid silent misconfiguration.- Parameters:
training_args (GRPOConfig) – Training config with Hub identifiers and push flags.
- Returns:
None. The function exits early when Hub pushes are disabled.- Return type:
None
- Raises:
RuntimeError – If the Hub preflight fails due to network or auth errors.
- maxent_grpo.core.hub.check_hub_revision_exists(training_args)[source]¶
Validate whether a target Hub revision exists and is safe to write.
The check avoids clobbering populated branches unless explicitly permitted via
overwrite_hub_revision. A README in the branch is treated as a signal that the branch has content.- Parameters:
training_args (GRPOConfig) – Training config with Hub identifiers and safety flags such as
push_to_hub_revisionandoverwrite_hub_revision.- Returns:
None. Raises if the target revision appears non-empty and overwriting is disallowed.- Return type:
None
- Raises:
ValueError – If the revision exists and appears non-empty without setting
overwrite_hub_revision.
- maxent_grpo.core.hub.get_param_count_from_repo_id(repo_id)[source]¶
Infer parameter count from naming conventions or Hub metadata.
Prefers parsing strings like
42m,1.5bor products like8x7bfrom the repo ID. Falls back to safetensors metadata when no pattern is found.
- maxent_grpo.core.hub.get_gpu_count_for_vllm(model_name, revision='main', num_gpus=8)[source]¶
Choose a valid GPU count for vLLM tensor parallelism.
vLLM requires that the number of attention heads and 64 are divisible by the tensor parallel size. This function decrements
num_gpusuntil the constraints are satisfied.- Parameters:
- Returns:
A compatible number of GPUs for vLLM tensor parallelism.
- Return type: