Optimising Hardware Accelerated Neural Networks with Quantisation and a Knowledge Distillation Evolutionary Algorithm

Robert Stewart, Andrew Nowlan, Pascal Bacchus, Quentin Ducasse, Ekaterina Komendantskaya
2021 Electronics  
This paper compares the latency, accuracy, training time and hardware costs of neural networks compressed with our new multi-objective evolutionary algorithm called NEMOKD, and with quantisation. We evaluate NEMOKD on Intel's Movidius Myriad X VPU processor, and quantisation on Xilinx's programmable Z7020 FPGA hardware. Evolving models with NEMOKD increases inference accuracy by up to 82% at the cost of 38% increased latency, with throughput performance of 100–590 image frames-per-second (FPS).
more » ... Quantisation identifies a sweet spot of 3 bit precision in the trade-off between latency, hardware requirements, training time and accuracy. Parallelising FPGA implementations of 2 and 3 bit quantised neural networks increases throughput from 6 k FPS to 373 k FPS, a 62× speedup.
doi:10.3390/electronics10040396 fatcat:nxaxzcnx75fffp74vrve3ywnci