20 Hits in 0.82 sec

Measuring scheduling efficiency of RNNs for NLP applications [article]

Urmish Thakker, Ganesh Dasika, Jesse Beu, Matthew Mattina
2019 arXiv   pre-print
Recurrent neural networks (RNNs) have shown state of the art results for speech recognition, natural language processing, image captioning and video summarizing applications. Many of these applications run on low-power platforms, so their energy efficiency is extremely important. We observed that cache-oblivious RNN scheduling during inference typically results in 30-50x more data transferred on and off the CPU than the application's working set size. This can potentially impact its energy
more » ... iency. This paper presents a new metric called Data Reuse Efficiency to gauge the RNN scheduling efficiency of a platform and shows the factors that influence the DRE value. Additionally, this paper discusses an optimization to improve reuse in RNNs and highlights the positive impact of this optimization on the total amount of memory read from or written to the memory controller (and, hence, the DRE value) during the execution of an RNN application for a mobile SoC.
arXiv:1904.03302v1 fatcat:fltrxdu3djdinfffvgcbf7oiju

Ternary MobileNets via Per-Layer Hybrid Filter Banks [article]

Dibakar Gope, Jesse Beu, Urmish Thakker, Matthew Mattina
2019 arXiv   pre-print
Besides pruning and quantization, tensor decomposition techniques (Jaderberg et al. (2014) ; Tai et al. (2015) ; Wen et al. (2017); Thakker et al. (2019c; a; ) exploit parameter redundancy to obtain  ... 
arXiv:1911.01028v1 fatcat:7awopatwxjgrtfypejzh3qqsre

Rank and run-time aware compression of NLP Applications [article]

Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, Matthew Mattina
2020 arXiv   pre-print
Structured matrices have shown significant potential for compression of NN (Sindhwani et al., 2015; Ding et al., 2017; Cheng et al., 2015; Thakker et al., 2020) .  ...  Thus, there is a need for good compression techniques to enable large NLP models to fit into an smaller edge device or ensure that they run efficiently on devices with smaller caches (Thakker et al.,  ... 
arXiv:2010.03193v1 fatcat:hwftkhffl5b4tnigr5ugfla4ty

Pushing the limits of RNN Compression [article]

Urmish Thakker, Igor Fedorov, Jesse Beu, Dibakar Gope, Chu Zhou, Ganesh Dasika, Matthew Mattina
2019 arXiv   pre-print
Recurrent Neural Networks (RNN) can be difficult to deploy on resource constrained devices due to their size. As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task accuracy. This paper introduces a method to compress RNNs for resource constrained environments using Kronecker product (KP). KPs can compress RNN layers by 16-38x with minimal accuracy loss. We show that KP can beat the task accuracy achieved by other
more » ... he-art compression techniques (pruning and low-rank matrix factorization) across 4 benchmarks spanning 3 different applications, while simultaneously improving inference run-time.
arXiv:1910.02558v2 fatcat:4diffzmavzejzfp2py7xtckyz4

Federated Learning for Resource-Constrained IoT Devices: Panoramas and State-of-the-art [article]

Ahmed Imteaj, Urmish Thakker, Shiqiang Wang, Jian Li, M. Hadi Amini
2020 arXiv   pre-print
Thakker et al. [2019] provided a potential direction in this regard.  ...  Kumar et al. [2017] and Thakker et al. [2019] solved these problems for inference, but did not discuss training the models on the device.  ... 
arXiv:2002.10610v1 fatcat:2hhzrb7firhb7ohrbionse7zca

Ternary MobileNets via Per-Layer Hybrid Filter Banks

Dibakar Gope, Jesse Beu, Urmish Thakker, Matthew Mattina
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)  
MobileNets family of computer vision neural networks have fueled tremendous progress in the design and organization of resource-efficient architectures in recent years. New applications with stringent real-time requirements on highly constrained devices require further compression of MobileNets-like compute-efficient networks. Model quantization is a widely used technique to compress and accelerate neural network inference and prior works have quantized MobileNets to 4 − 6 bits, albeit with a
more » ... dest to significant drop in accuracy. While quantization to sub-byte values (i.e. precision ≤ 8 bits) has been valuable, even further quantization of MobileNets to binary or ternary values is necessary to realize significant energy savings and possibly runtime speedups on specialized hardware, such as ASICs and FPGAs. Under the key observation that convolutional filters at each layer of a deep neural network may respond differently to ternary quantization, we propose a novel quantization method that generates per-layer hybrid filter banks consisting of full-precision and ternary weight filters for MobileNets. Using this proposed quantization method, we quantize a substantial portion of weight filters of MobileNets to ternary values resulting in a 27.98% savings in energy, and a 51.07% reduction in the model size, while achieving comparable accuracy and no degradation in throughput on specialized hardware in comparison to the baseline full-precision MobileNets. Finally, we demonstrate the generalizability and effectiveness of hybrid filter banks to other neural network architectures.
doi:10.1109/cvprw50498.2020.00362 dblp:conf/cvpr/GopeBTM20 fatcat:wc3vv5rnw5exvicsdeea4atasm

Doping: A technique for Extreme Compression of LSTM Models using Sparse Structured Additive Matrices

Urmish Thakker, Paul N. Whatmough, Zhi Gang Liu, Matthew Mattina, Jesse G. Beu
2021 Conference on Machine Learning and Systems  
., 2018; Thakker et al., 2019c) .  ...  Similarly, we can create doped HMD (Thakker et al., 2019) compression method.  ... 
dblp:conf/mlsys/ThakkerWLMB21 fatcat:xcxa4buiq5hlfedzh337xpxxhy

Compressing Language Models using Doped Kronecker Products [article]

Urmish Thakker, Paul N. Whatmough, Zhi-Gang Liu, Matthew Mattina, Jesse Beu
2020 arXiv   pre-print
(Thakker et al., 2019b) propose Hybrid KP (HKP) to solve this issue. HKP helps recover 1 Arm ML Research Lab. Correspondence to: Urmish Thakker <>.  ...  et al., 2019c) ) and HKP ( (Thakker et al., 2019b) ).  ... 
arXiv:2001.08896v5 fatcat:pwyear3j3vepphqrh6zwp7eapm

Run-Time Efficient RNN Compression for Inference on Edge Devices [article]

Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, Matthew Mattina
2019 arXiv   pre-print
Recurrent neural networks can be large and compute-intensive, yet many applications that benefit from RNNs run on small devices with very limited compute and storage capabilities while still having run-time constraints. As a result, there is a need for compression techniques that can achieve significant compression without negatively impacting inference run-time and task accuracy. This paper explores a new compressed RNN cell implementation called Hybrid Matrix Decomposition (HMD) that achieves
more » ... this dual objective. This scheme divides the weight matrix into two parts - an unconstrained upper half and a lower half composed of rank-1 blocks. This results in output features where the upper sub-vector has "richer" features while the lower-sub vector has "constrained" features". HMD can compress RNNs by a factor of 2-4x while having a faster run-time than pruning and retaining more model accuracy than matrix factorization. We evaluate this technique on 3 benchmarks.
arXiv:1906.04886v2 fatcat:rwdjt6zs2bgz5oobaxiloz7seu

Benchmarking TinyML Systems: Challenges and Direction [article]

Colby R. Banbury, Vijay Janapa Reddi, Max Lam, William Fu, Amin Fazel, Jeremy Holleman, Xinyuan Huang, Robert Hurtado, David Kanter, Anton Lokhmotov, David Patterson, Danilo Pau (+5 others)
2021 arXiv   pre-print
Recent advancements in ultra-low-power machine learning (TinyML) hardware promises to unlock an entirely new class of smart applications. However, continued progress is limited by the lack of a widely accepted benchmark for these systems. Benchmarking allows us to measure and thereby systematically compare, evaluate, and improve the performance of systems and is therefore fundamental to a field reaching maturity. In this position paper, we present the current landscape of TinyML and discuss the
more » ... challenges and direction towards developing a fair and useful hardware benchmark for TinyML workloads. Furthermore, we present our four benchmarks and discuss our selection methodology. Our viewpoints reflect the collective thoughts of the TinyMLPerf working group that is comprised of over 30 organizations.
arXiv:2003.04821v4 fatcat:rodnh7fd6fa57low2j4jcuqbc4

MLPerf Tiny Benchmark [article]

Colby Banbury, Vijay Janapa Reddi, Peter Torelli, Jeremy Holleman, Nat Jeffries, Csaba Kiraly, Pietro Montino, David Kanter, Sebastian Ahmed, Danilo Pau, Urmish Thakker, Antonio Torrini (+10 others)
2021 arXiv   pre-print
Advancements in ultra-low-power tiny machine learning (TinyML) systems promise to unlock an entirely new class of smart applications. However, continued progress is limited by the lack of a widely accepted and easily reproducible benchmark for these systems. To meet this need, we present MLPerf Tiny, the first industry-standard benchmark suite for ultra-low-power tiny machine learning systems. The benchmark suite is the collaborative effort of more than 50 organizations from industry and
more » ... a and reflects the needs of the community. MLPerf Tiny measures the accuracy, latency, and energy of machine learning inference to properly evaluate the tradeoffs between systems. Additionally, MLPerf Tiny implements a modular design that enables benchmark submitters to show the benefits of their product, regardless of where it falls on the ML deployment stack, in a fair and reproducible manner. The suite features four benchmarks: keyword spotting, visual wake words, image classification, and anomaly detection.
arXiv:2106.07597v4 fatcat:ps4y36uq4nevxfbe7p3tne4opu

Compressing RNNs for IoT devices by 15-38x using Kronecker Products [article]

Urmish Thakker, Jesse Beu, Dibakar Gope, Chu Zhou, Igor Fedorov, Ganesh Dasika, Matthew Mattina
2020 arXiv   pre-print
Correspondence to: Urmish Thakker <>.  ...  HKPRNN is inspired from HMD proposed in (Thakker et al., 2019a) .  ... 
arXiv:1906.02876v5 fatcat:dtxwn4wfarfwpmbbuykgk7clpy

MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers [article]

Colby Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas Navarro, Urmish Thakker, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, Paul N. Whatmough
2021 arXiv   pre-print
., 2019) , Structured Matrices (Thakker et al., 2019b; Thakker et al., 2019) ), or has used closed-source software stacks which make deployment and comparison impossible (e.g.  ...  Recent research has explored various model architectures suitable for resource constrained devices Wong et al., 2020; Kusupati et al., 2018; Thakker et al., 2019a) .  ... 
arXiv:2010.11267v6 fatcat:cte3gwj2wnh3nlg3rnonvbpazu

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts [article]

Stephen H. Bach, Victor Sanh, Zheng-Xin Yong, Albert Webson, Colin Raffel, Nihal V. Nayak, Abheesht Sharma, Taewoon Kim, M Saiful Bari, Thibault Fevry, Zaid Alyafeai, Manan Dey (+15 others)
2022 arXiv   pre-print
PromptSource is a system for creating, sharing, and using natural language prompts. Prompts are functions that map an example from a dataset to a natural language input and target output. Using prompts to train and query language models is an emerging area in NLP that requires new tools that let users develop and refine these prompts collaboratively. PromptSource addresses the emergent challenges in this new setting with (1) a templating language for defining data-linked prompts, (2) an
more » ... e that lets users quickly iterate on prompt development by observing outputs of their prompts on many examples, and (3) a community-driven set of guidelines for contributing new prompts to a common pool. Over 2,000 prompts for roughly 170 datasets are already available in PromptSource. PromptSource is available at
arXiv:2202.01279v3 fatcat:f364bvoinfexfi2yzkglw7beoe

Multitask Prompted Training Enables Zero-Shot Task Generalization [article]

Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, Manan Dey, M Saiful Bari (+29 others)
2022 arXiv   pre-print
We explicitly highlight the work of: Lintang Sutawika, who helped with evaluation and writing; Urmish Thakker, Mike Tian-Jian Jiang, Shanya Sharma, Arnaud Stiegler, and Manan Dey who helped with the development  ... 
arXiv:2110.08207v3 fatcat:vvacmc2phfg7dpmdqiebyvxvei
« Previous Showing results 1 — 15 out of 20 results