maxent_grpo.training.rollout

Copyright 2025 Liv d’Aliberti

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Rollout utilities for the MaxEnt runner.

class maxent_grpo.training.rollout.CompletionGenerator(ctx)[source]

Bases: LocalGenerationMixin, VLLMGenerationMixin

Stateful helper that handles both local HF and vLLM completions.

Parameters:

ctx (GenerationContext)

class maxent_grpo.training.rollout.GenerationContext(max_prompt_len, max_completion_len, gen_temperature, gen_top_p, use_vllm, vllm, accelerator, model, tokenizer, generation_stats, device, penalty=<factory>, prompt_char_limit=None, *, vllm_mode='server')[source]

Bases: GenerationPenaltyPassthroughMixin, GenerationSamplingConfig

Configuration required to produce completions for each training batch.

Parameters:
accelerator: TypesAccelerator
model: TypesPreTrainedModel
tokenizer: TypesPreTrainedTokenizer
generation_stats: Dict[str, int]
device: Any
penalty: GenerationPenaltyConfig
prompt_char_limit: int | None = None
as_dict()[source]

Return a lightweight representation useful for logging/debugging.

Return type:

Dict[str, Any]

Modules

context

Shared generation context dataclass used by local and vLLM paths.

distributed

Distributed helpers shared across generation utilities.

generator

Public CompletionGenerator that wires local and vLLM helpers together.

helpers

Completion generation helpers for the MaxEnt-GRPO runner.

local

Local HF generation helpers split from the vLLM adapter.

vllm_adapter

vLLM-focused helpers split away from the local generation path.

vllm_colocate

In-process (colocated) vLLM generation helpers for the custom loop.