Post-Training Quantization of a Language Model using Distiller
A detailed, Jupyter Notebook based tutorial on this topic is located at
You can view a "read-only" version of it in the Distiller GitHub repository here.
The tutorial covers the following:
- Converting the model to use Distiller's modular LSTM implementation, which allows flexible quantization of internal LSTM operations.
- Collecting activation statistics prior to quantization
- Creating a
PostTrainLinearQuantizerand preparing the model for quantization
- "Net-aware quantization" capability of
- Progressively tweaking the quantization settings in order to improve accuracy