Filters








34,283 Hits in 4.8 sec

Efficient Machine Learning, Compilers, and Optimizations for Embedded Systems [article]

Xiaofan Zhang, Yao Chen, Cong Hao, Sitao Huang, Yuhong Li, Deming Chen
2022 arXiv   pre-print
To address these challenges, we will introduce a series of effective design methods in this book chapter to enable efficient algorithms, compilers, and various optimizations for embedded systems.  ...  However, these emerging AI applications also come with increasing computation and memory demands, which are challenging to handle especially for the embedded systems where limited computation/memory resources  ...  These methods can be categorized into efficient machine learning algorithms, accelerator and compiler designs, and various co-design and optimization strategies.  ... 
arXiv:2206.03326v1 fatcat:th66tbqxibez7hmctl2ytdiroa

TapirXLA: Embedding Fork-Join Parallelism into the XLA Compiler in TensorFlow Using Tapir [article]

Tao B. Schardl, Siddharth Samsi
2019 arXiv   pre-print
Machine-learning applications rely on efficient parallel processing to achieve performance, and they employ a variety of technologies to improve performance, including compiler technology.  ...  But compilers in machine-learning frameworks lack a deep understanding of parallelism, causing them to lose performance by missing optimizations on parallel computation.  ...  ACKNOWLEDGMENTS The authors acknowledge the MIT Lincoln Laboratory Supercomputing Center for providing HPC resources that have contributed to the research results reported in this paper.  ... 
arXiv:1908.11338v1 fatcat:fpe4tmadwnbd3adplv4gfjbrpe

An Evaluation of Autotuning Techniques for the Compiler Optimization Problems

Amir Hossein Ashouri, Gianluca Palermo, Cristina Silvano
2016 Design, Automation, and Test in Europe  
In this paper, we evaluate our different autotuning approaches including the use of Design Space Exploration (DSE) techniques and machine learning to further tackle the both problems of selecting and the  ...  To address the problem, different optimization techniques has been used for traversing, pruning the huge space, adaptability and portability.  ...  Conclusion paper presents two main approaches for the compiler autotuning problem using DSE and machine learning.  ... 
dblp:conf/date/AshouriPS16 fatcat:pqyxnkf735gi3iz3o6s7tzrjtu

TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems [article]

Robert David, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat Jeffries, Jian Li, Nick Kreeger, Ian Nappier, Meghna Natraj, Shlomi Regev, Rocky Rhodes, Tiezhen Wang (+1 others)
2021 arXiv   pre-print
We introduce TensorFlow Lite Micro (TF Micro), an open-source ML inference framework for running deep-learning models on embedded systems.  ...  As a result, the machine-learning (ML) models and associated ML inference framework must not only execute efficiently but also operate in a few kilobytes of memory.  ...  INTRODUCTION Tiny machine learning (TinyML) is a burgeoning field at the intersection of embedded systems and machine learning.  ... 
arXiv:2010.08678v3 fatcat:hemuhoadafd6hi3wg7exoyqnue

TinyIREE: An ML Execution Environment for Embedded Systems from Compilation to Deployment [article]

Hsin-I Cindy Liu, Marius Brehler, Mahesh Ravishankar, Nicolas Vasilache, Ben Vanik, Stella Laurenzo
2022 arXiv   pre-print
Machine learning model deployment for training and execution has been an important topic for industry and academic research in the last decade.  ...  In this paper, we present IREE, a unified compiler and runtime stack with the explicit goal to scale down machine learning programs to the smallest footprints for mobile and edge devices, while maintaining  ...  Brehler was supported by the German Federal Ministry of Education and Research (BMBF) as part of the AIA project under grant number 01IS19060A and the Competence Center Machine Learning Rhine-Ruhr (ML2R  ... 
arXiv:2205.14479v1 fatcat:u77u6s5ypvbcleebc7ec6upvme

The Berlin Big Data Center (BBDC)

Christoph Boden, Tilmann Rabl, Volker Markl
2018 it - Information Technology  
Framework for transparent and reproducible benchmark experiments of distributed data processing systems, approaches to foster the interpretability of machine learning models and finally provide an overview  ...  However, writing efficient implementations of data analysis programs on these systems requires a deep understanding of systems programming, prohibiting large groups of data scientists and analysts from  ...  analysis and machine learning algorithms.  ... 
doi:10.1515/itit-2018-0016 fatcat:mnxe772elba5rekvxds5j4xgpm

An Automatic Compiler Optimizations Selection Framework for Embedded Applications

Shih-Hao Hung, Chia-Heng Tu, Huang-Sen Lin, Chi-Meng Chen
2009 2009 International Conference on Embedded Software and Systems  
This paper aims at system-wide compiler optimizations selection for embedded applications.  ...  For this framework, we implemented compiler optimization selection algorithms and evaluated its efficiencies with and without performance monitoring hardware support.  ...  ., a grant from the National Science Council (95C2443-2), and a grant from Excellent Research Projects of National Taiwan University (96R0062-AE00-07).  ... 
doi:10.1109/icess.2009.86 dblp:conf/icess/HungTLC09 fatcat:qh4mbsdxcffl3fduxugp5fe5ti

Memory Utilization and Machine Learning Techniques for Compiler Optimization

A V Shreyas Madhav, Siddarth Singaravel, A Karmel, J. Kannan R., P. Kommers, A. S, A. Quadir Md
2021 ITM Web of Conferences  
This article aims to provide an overall survey of the cache optimization methods, multi memory allocation features and explore the scope of machine learning in compiler optimization to attain a sustainable  ...  The realm of compiler suites that possess and apply efficient optimization methods provide a wide array of beneficial attributes that help programs execute efficiently with low execution time and minimal  ...  The paper provides a brief overview of the properties of the memory based and machine learning techniques that have been employed for the purpose of compiler optimization.  ... 
doi:10.1051/itmconf/20213701021 fatcat:b7lrzsnszrbcdbxqreb2qmqlby

面向机器学习系统的张量中间表示

Yimin Zhuang, Yuanbo Wen, Wei Li, Qi Guo
2021 Scientia Sinica Informationis  
His current research interests include compiler, programming model and machine learning.  ...  Figure 7 The optimizations base on tensor IR ) . : , , , . . : , Zhuang Y M, Wen Y B, Li W, et al. A tensor intermediate representation for machine learning systems (in Chinese).  ...  The machine learning compilers are crucial to machine learning systems.  ... 
doi:10.1360/ssi-2020-0398 fatcat:3hhamznebngb5eb7gee7x42tqq

TensorFlow Lite Micro: Embedded Machine Learning for TinyML Systems

Robert David, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat Jeffries, Jian Li, Nick Kreeger, Ian Nappier, Meghna Natraj, Tiezhen Wang, Pete Warden, Rocky Rhodes
2021 Conference on Machine Learning and Systems  
INTRODUCTION Tiny machine learning (TinyML) is a burgeoning field at the intersection of embedded systems and machine learning.  ...  TensorFlow Lite Micro (TFLM) is an open-source ML inference framework for running deep-learning models on embedded systems.  ...  We extend our gratitude to many individuals, teams, and organizations: Fredrik  ... 
dblp:conf/mlsys/DavidDJRJLKNNWW21 fatcat:hvsi6bzy5vakdcun4sgfgehe3m

TinyIREE: An ML Execution Environment for Embedded Systems from Compilation to Deployment

Hsin-I Liu, Marius Brehler, Mahesh Ravishankar, Nicolas Vasilache, Ben Vanik, Stella Laurenzo
2022 IEEE Micro  
Machine learning model deployment for training and execution has been an important topic for industry and academic research in the last decade.  ...  In this paper, we present IREE, a unified compiler and runtime stack with the explicit goal to scale down machine learning programs to the smallest footprints for mobile and edge devices, while maintaining  ...  We would also like to thank Scott Main for the review and  ... 
doi:10.1109/mm.2022.3178068 fatcat:7lyffgoqdbgpxetruv7ubjzqiy

Implementing Domain-Specific Languages for Heterogeneous Parallel Computing

HyoukJoong Lee, Kevin Brown, Arvind Sujeeth, Hassan Chafi, Tiark Rompf, Martin Odersky, Kunle Olukotun
2011 IEEE Micro  
Even worse, the optimized code for one system is neither portable nor guarantees high performance on another system.  ...  Common examples are Pthreads or OpenMP for multicore CPU, OpenCL or CUDA for GPU, and message passing interface (MPI) for clusters.  ...  Acknowledgments We thank the anonymous reviewers for their valuable feedback.  ... 
doi:10.1109/mm.2011.68 fatcat:a77o6qub7fhyrlqccdiyqkbj3y

Language virtualization for heterogeneous parallel computing

Hassan Chafi, Zach DeVito, Adriaan Moors, Tiark Rompf, Arvind K. Sujeeth, Pat Hanrahan, Martin Odersky, Kunle Olukotun
2010 Proceedings of the ACM international conference on Object oriented programming systems languages and applications - OOPSLA '10  
We define criteria for language virtualization and present techniques to achieve them.  ...  We propose language virtualization as a new principle that enables the construction of highly efficient parallel domain specific languages that are embedded in a common host language.  ...  Data Analysis and Machine Learning with OptiML The second example of the use of language virtualization is for the implementation of OptiML, a DSL for machine learning.  ... 
doi:10.1145/1869459.1869527 dblp:conf/oopsla/ChafiDMRSHOO10 fatcat:wgsarnltdnebfmfkp6xejvwl6m

Language virtualization for heterogeneous parallel computing

Hassan Chafi, Zach DeVito, Adriaan Moors, Tiark Rompf, Arvind K. Sujeeth, Pat Hanrahan, Martin Odersky, Kunle Olukotun
2010 SIGPLAN notices  
We define criteria for language virtualization and present techniques to achieve them.  ...  We propose language virtualization as a new principle that enables the construction of highly efficient parallel domain specific languages that are embedded in a common host language.  ...  Data Analysis and Machine Learning with OptiML The second example of the use of language virtualization is for the implementation of OptiML, a DSL for machine learning.  ... 
doi:10.1145/1932682.1869527 fatcat:b26cnd5a5vbmpktbmjb7gcigfm

Running a Java VM inside an operating system kernel

Takashi Okumura, Bruce R. Childers, Daniel Mosse
2008 Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments - VEE '08  
For this purpose, we first implemented a compact Java Virtual Machine with a Just-In-Time compiler on the Intel IA32 instruction set architecture at the user space.  ...  In this paper, we describe such an approach, where a lightweight Java virtual machine is embedded within the kernel for flexible extension of kernel network I/O.  ...  Third, the virtual machine is embedded inside an operating system (Embedding).  ... 
doi:10.1145/1346256.1346279 dblp:conf/vee/OkumuraCM08 fatcat:aia2na66lzfztkqy26snkfr5ne
« Previous Showing results 1 — 15 out of 34,283 results