Data Generation and Training

Speculators currently supports training of Eagle3 speculative decoders. For full details on all the steps described below, see README.md

This process is currently broken down into three key steps:

Data Generation
Vocab Mapping
Training

Data Generation

Generate hidden states for training using vLLM. Dataset values are passed through the target or verifier model and generated hidden states are saved to disk for further use. scripts/data_generation_offline.py provides the main entry point for generating training data for Eagle3 models.

Once completed, the following files will be generated on disk:

token_freq.pt (the token frequency distribution file)
data_config.json (data metadata)
data pt files containing the hidden state values

Note: this process uses vLLM and requires the datagen optional install.

Vocab Mapping

Build d2t and t2d files from the token frequency distribution file. scripts/build_vocab_mapping.py is the main entrypoint for this step.

Once completed, the following files will be generated from this step on disk:

d2t.npy
t2d.npy

Training

Train an Eagle3 draft model or speculator. Currently, training is supported for:

Single-Layer and Multi-Layer Draft Models for Non-MoE models
Single-Layer and Multi-Layer Draft Models of certain Non-Vision MoEs

For a full list of models with support, see: https://github.com/vllm-project/speculators/blob/main/README.md

scripts/train.py provides the main entry point for training Eagle3 models with support for single and multi GPU training using FSDP.

Examples

The files in this folder provide end-to-end examples which run the three steps listed above for GPT-OSS, Llama3 and Qwen3 draft models. If at any point a step fails, you can rerun the script and continue from the last step. Seprate steps may also run using the individual scripts listed above.