CascadeCNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks [article]

Alexandros Kouris, Stylianos I. Venieris, Christos-Savvas Bouganis
2018 arXiv   pre-print
This work presents CascadeCNN, an automated toolflow that pushes the quantisation limits of any given CNN model, aiming to perform high-throughput inference. A two-stage architecture tailored for any given CNN-FPGA pair is generated, consisting of a low- and high-precision unit in a cascade. A confidence evaluation unit is employed to identify misclassified cases from the excessively low-precision unit and forward them to the high-precision unit for re-processing. Experiments demonstrate that
more » ... e proposed toolflow can achieve a performance boost up to 55% for VGG-16 and 48% for AlexNet over the baseline design for the same resource budget and accuracy, without the need of retraining the model or accessing the training data.
arXiv:1807.05053v1 fatcat:bn3sz2ewgjgfrgsmpzymnfmk5y