Adding New Speculative Decoding Algorithms
This guide explains how to add a new speculative decoding algorithm to the Speculators library.
Quick Start
Adding a new algorithm requires:
-
Create algorithm module under
src/speculators/models. -
Configuration class with
@registerdecorator. When Python imports your module, the@register("myalgo")decorator adds your class to a global registry dictionary. The training script looks up"myalgo"in the registry to find your class. This is helpful because the training script doesn't need to know about every algorithm and adding a new algorithm doesn't require modifying the training script. -
Model class with
@registerdecorator -
Training factory methods as classmethods on the model
-
CLI arguments in
train.py
Step-by-Step Guide
1. Create Algorithm Module
Create a self-contained directory for your algorithm under src/speculators/models. See src/speculators/models/eagle3 as an example. This keeps algorithm logic isolated and maintainable. Each algorithm owns its configuration, model definition, and any custom components. Example file structure:
src/speculators/models
|-> eagle3
|-> ...
|-> new_algorithm
|-> __init__.py
|-> core.py
|-> config.py
2. Implement Configuration Class
Define how your algorithm is configured. The config stores hyperparameters, architectural choices, and other settings. It's serialized when saving models and deserialized when loading them. In config.py, create a configuration class with the @register decorator, for example:
from speculators import SpeculatorModelConfig
@SpeculatorModelConfig.register("myalgo")
class MyAlgoSpeculatorConfig(SpeculatorModelConfig):
speculators_model_type: str = "myalgo"
# Algorithm-specific parameters
block_size: int = 8
num_layers: int = 1
Reference: See src/speculators/models/eagle3/config.py for a complete example.
Key points:
- Use
@SpeculatorModelConfig.register("myalgo")decorator - Set
speculators_model_typeto match your algorithm name - Inherit common fields from
SpeculatorModelConfig - Add algorithm-specific parameters as needed
3. Implement Model Class
Define your algorithm's architecture and training interface. The model class contains model architecture, forward pass logic, and training setup. By implementing the required methods, your algorithm should work seamlessly with the training infrastructure.
In core.py, create a model class with the @register decorator and required training factory methods.
Reference: See src/speculators/models/eagle3/core.py for a complete example.
Required for the training infrastructure:
Model attributes:
layers: ModuleList of decoder layers (each layer is individually wrapped by FSDP for distributed training)
Methods:
from_training_args(cls, verifier_config, **kwargs): Factory method to build from CLI args (receives all args as kwargs)get_trainer_kwargs(**kwargs): Returns(train_kwargs, val_kwargs)dicts passed toforward()forward(...): Must return(output, loss, metrics)where metrics includes a"loss"key
4. Export Classes
Make your classes importable from the package. Python's import system requires explicit exports from __init__.py. This also provides a clean public API.
In __init__.py, export your config and model classes.
from speculators.models.eagle3.config import Eagle3SpeculatorConfig
from speculators.models.eagle3.core import Eagle3DraftModel
__all__ = [
"Eagle3DraftModel",
"Eagle3SpeculatorConfig",
]
Reference: See src/speculators/models/eagle3/__init__.py
5. Add CLI Arguments (Optional)
Add algorithm-specific command-line arguments to the training script. If your algorithm has unique hyperparameters (like Eagle3's --ttt-steps or a custom --block-size), users need a way to configure them from the command line. These arguments are passed to your from_training_args() method. Only add arguments if your algorithm needs parameters beyond the common ones (verifier path, number of layers, etc.).
Reference: See scripts/train.py
6. Train Your Model
The training script should automatically works with your new algorithm:
torchrun --nnodes=1 --nproc_per_node=8 scripts/train.py \
--speculator-type myalgo \
--verifier-name-or-path meta-llama/Llama-3.1-8B \
--num-layers 1 \
--block-size 8 \
--data-path ./data \
--save-path ./checkpoints \
--epochs 20
How It Works
The flow during training:
- User runs:
python train.py --speculator-type myalgo - Training script calls:
model_class = SpeculatorModel.get_class("myalgo") - Registry returns:
MyAlgoDraftModelclass - Script converts args to dict:
vars(args)and calls:model_class.from_training_args(verifier_config, **vars(args)) - Your factory method extracts the kwargs it needs and builds the model instance
- Trainer validates the model is registered (via checks in
setup_model()andapply_fully_sharded())
This pattern is similar to how transformers uses .from_pretrained() - each model owns its own instantiation logic.
Reference: See scripts/train.py
Using Base Components
Shared transformer layer components that can be reused across algorithms. Many speculative decoding algorithms use similar architectural components (decoder layers, attention, normalization). Instead of duplicating code, you can import pre-configured components for different base model architectures.
When to use: If your algorithm uses standard transformer components from models like LLaMA or Qwen3, you can import them from base_components instead of defining your own. This is especially useful when you only need to customize one layer (like the first layer) while keeping the rest standard.
Available architectures: llama, qwen3
Reference:
- Component definitions:
src/speculators/models/base_components.py - Usage example:
src/speculators/models/eagle3/model_definitions.py