maxent_grpo.training.zero_utils¶
Utilities to safely integrate DeepSpeed ZeRO with optional dependencies.
Functions
|
Invoke GatheredParameters handling pre/post modifier_rank support. |
|
Return the DeepSpeedEngine class when available. |
|
Temporarily disable HF DeepSpeed ZeRO-3 init for model loading. |
|
Return the embedding weight tensor when ZeRO gathering is required. |
|
Return all embedding-like weights requiring ZeRO gathering. |
|
Return a cuda namespace exposing |
|
Best-effort initialization of DeepSpeed helpers when installed. |
|
Return the callable GatheredParameters helper when available. |
|
Return True when the provided model is a DeepSpeed engine. |
|
Patch DeepSpeedEngine.no_sync to a no-op when gradients are partitioned. |
|
Gather ZeRO-sharded embedding weights before a forward pass. |
|
Gather ZeRO-partitioned params only when needed. |
|
Reserve parameter ids for a ZeRO gather region. |
|
Return a parameter list for ZeRO-gather contexts, unwrapping engines. |
|
Return |
|
Return whether the model partitions gradients (ZeRO-3). |
|
Return the DeepSpeed ZeRO stage for a model when available. |
|
Best-effort extraction of the DeepSpeed ZeRO status name. |
Classes
|
Callable signature exposed by DeepSpeed GatheredParameters. |