A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Rank and run-time aware compression of NLP Applications
2020
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing
unpublished
Sequence model based NLP applications can be large. Yet, many applications that benefit from them run on small devices with very limited compute and storage capabilities, while still having run-time constraints. As a result, there is a need for a compression technique that can achieve significant compression without negatively impacting inference run-time and task accuracy. This paper proposes a new compression technique called Hybrid Matrix Factorization that achieves this dual objective. HLF
doi:10.18653/v1/2020.sustainlp-1.2
fatcat:avtfr34sdbep3nlnvgylpnytaq