Post-Training Quantization of a Language Model using Distiller

A detailed, Jupyter Notebook based tutorial on this topic is located at <distiller_repo_root>/examples/word_language_model/quantize_lstm.ipynb.
You can view a "read-only" version of it in the Distiller GitHub repository here.

The tutorial covers the following:

  • Converting the model to use Distiller's modular LSTM implementation, which allows flexible quantization of internal LSTM operations.
  • Collecting activation statistics prior to quantization
  • Creating a PostTrainLinearQuantizer and preparing the model for quantization
  • "Net-aware quantization" capability of PostTrainLinearQuantizer
  • Progressively tweaking the quantization settings in order to improve accuracy