103,442 Hits in 5.9 sec

A new look at exploiting data parallelism in embedded systems

Hillery C. Hunter, Jaime H. Moreno
2003 Proceedings of the international conference on Compilers, architectures and synthesis for embedded systems - CASES '03  
# " 5 7 6 8 5 @ 9 DATA IN DATA OUT A C BDE F G DEF DEF ! # " % $ 1 & ( # 0 H $ ' I 2 P !  ...  H " Q 4 R T S 1 U V X W I Y Ì b a W 3 c I c dW f e g c I W DATA IN DATA OUT h 7 i q p r t s u w v I x I y ¤ v I I v C 3 v DATA IN DATA OUT ! H " G $ 1 & ( H 0 # $ ' I 2 4 ! # " ' C C ' C ' C C ' C !  ... 
doi:10.1145/951732.951733 fatcat:zkqbizprmraujdqdukm2umipu4

A new look at exploiting data parallelism in embedded systems

Hillery C. Hunter, Jaime H. Moreno
2003 Proceedings of the international conference on Compilers, architectures and synthesis for embedded systems - CASES '03  
# " 5 7 6 8 5 @ 9 DATA IN DATA OUT A C BDE F G DEF DEF ! # " % $ 1 & ( # 0 H $ ' I 2 P !  ...  H " Q 4 R T S 1 U V X W I Y Ì b a W 3 c I c dW f e g c I W DATA IN DATA OUT h 7 i q p r t s u w v I x I y ¤ v I I v C 3 v DATA IN DATA OUT ! H " G $ 1 & ( H 0 # $ ' I 2 4 ! # " ' C C ' C ' C C ' C !  ... 
doi:10.1145/951710.951733 dblp:conf/cases/HunterM03 fatcat:bekucjkgzvg4llxhbpefjs6qka

Unsupervised Transfer Learning in Multilingual Neural Machine Translation with Cross-Lingual Word Embeddings [article]

Carlos Mullov and Ngoc-Quan Pham and Alexander Waibel
2021 arXiv   pre-print
In this work we look into adding a new language to a multilingual NMT system in an unsupervised fashion.  ...  Under the utilization of pre-trained cross-lingual word embeddings we seek to exploit a language independent multilingual sentence representation to easily generalize to a new language.  ...  In a monolingual data only setting we aim at achieving an equivalent result through cross-lingual word embeddings.  ... 
arXiv:2103.06689v1 fatcat:qbczgw62lvhvjfl3km5szhpmdm

Design and implementation of embedded multiprocessor architecture using FPGA

Muataz H. Salih, M. R. Arshad
2010 2010 IEEE Symposium on Industrial Electronics and Applications (ISIEA)  
We have therefore designed a new architecture called embedded concurrent computing (ECC), which is implemented on an FPGA chip using VHDL.  ...  This paper proposes a design and implementation of embedded multiprocessors architecture system focusing on its design area and performance.  ...  Acknowledgments The authors would like to thank the Underwater Robotics Research Group (URRG) in the USM for their assistance and NOD, MOSTI, for providing the research grant (Grant no. 6050124).  ... 
doi:10.1109/isiea.2010.5679397 fatcat:srjvmfyhrbafri5onb2gdmkpbq

Evaluating Embedded GPUs Performance via Computer Vision Applications

Paulo S. S. de Souza, Arthur F. Lorenzon, Marcelo C. Luizelli, Fabio D. Rossi
2020 International Journal of Computer Applications  
The results show that, despite the architectural limitations, using such devices can lead to a speed-up of 8 times compared to traditional embedded systems processing data only on CPUs.  ...  Despite the similarities with generalpurpose architectures that already exploit the benefits of GPUs, this new kind of embedded devices presents some architectural singularities, such as differences in  ...  In this context, depending on the image sequence being processed, the Super Resolution algorithm may have to look for new frames to access several different positions in the system memory.  ... 
doi:10.5120/ijca2020920518 fatcat:s6u4poubzfbmpi4srzsz27qwey

RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance [article]

Udit Gupta, Samuel Hsia, Jeff Zhang, Mark Wilkening, Javin Pombra, Hsien-Hsin S. Lee, Gu-Yeon Wei, Carole-Jean Wu, David Brooks
2021 arXiv   pre-print
In particular, RPAccel processes queries in sub-batches to pipeline recommendation stages, implements dual static and dynamic embedding caches, a set of top-k filtering units, and a reconfigurable systolic  ...  This paper presents RecPipe, a system to jointly optimize recommendation quality and inference performance.  ...  First, GPUs exhibit low utilization when exploiting data-level parallelism in the frontend and model-level parallelism in the backend, primarily due to the high overhead of embedding lookups and memory  ... 
arXiv:2105.08820v2 fatcat:tsq6jygecvdo5l2bgs5pcbqbmu

Design space exploration for real-time embedded stream processors

Sridhar Rajagopal, J.R. Cavallaro, S. Rixner
2004 IEEE Micro  
We present a framework for rapidly exploring the design space for stream processors in real-time embedded systems.  ...  There is a trade-off between the number of arithmetic units in a cluster of a stream processor, the number of clusters and the clock frequency as each solution meets real-time at a different power consumption  ...  Acknowledgements Sridhar Rajagopal and Joseph Cavallaro were supported in part by Nokia Corporation, Texas Instruments, Inc., and by NSF under grants EIA-0224458, and EIA-0321266.  ... 
doi:10.1109/mm.2004.25 fatcat:akwodtq6x5cqze2rj6kokllpti

Parallel Embedded Computing Architectures [chapter]

Michael Schmidt, Dietmar Fey, Marc Reichenbach
2012 Embedded Systems - High Performance Systems, Applications and Projects  
In the following, we will take a closer look at this special type of application.  ...  In addition the presented memory management system can also be exploited for memory bound data parallel applications in HPC.  ... 
doi:10.5772/38478 fatcat:mwjbl67plnaxxemz4gfovwue2i

Exploring chip-multiprocessors in deeply-embedded real-time computing

Xuan Qi
2008 ACM SIGBED Review  
In this essay, I propose the application of CMP in deeplyembedded real-time system.  ...  As an energy efficient high-performance architecture, chip multiprocessor (CMP) can be deployed in deeply-embedded real-time computing.  ...  Finally, the C2D-based testbed will be incorporated and tested in a mobile robot (as a deeply embedded real-time system) for data collection in a sensor network.  ... 
doi:10.1145/1366283.1366296 fatcat:p6f3gmhxrvfsncsd43ov6vk7ti

Simultaneous thin-thread processors for low-power embedded systems

Won W. Ro, Jaeyoung Yi, Joon-Sang Park, Joonseok Park
2008 IEICE Electronics Express  
In this paper, we investigate the possibility to use multi-threaded processors to solve the problems with the traditional superscalar processors in embedded systems.  ...  A drawback is that the conventional design of the superscalar processors possesses inherent complexity and power problems which are not easily acceptable in the domain of embedded processors.  ...  In this paper, we develop and propose a low-power multithreaded processor model for future embedded systems.  ... 
doi:10.1587/elex.5.802 fatcat:axoa4v4bfje63hogzswwg3shci

Implications of Electronics Technology Trends for Algorithm Design

D. Greenfield, S. Moore
2009 Computer journal  
To assess the efficiency of an algorithm we will need to be able to predict data movement both in time and space.  ...  Scaling of electronics technology has brought us to a pivotal point in the design of computational devices.  ...  ACKNOWLEDGEMENTS This work was funded in part by a grant from EPSRC (EP/D036895) and a studentship from the Gates Cambridge Trust.  ... 
doi:10.1093/comjnl/bxp013 fatcat:dyhmwbw5ozcmblo7nbero6kxaq

Unsupervised Neural Machine Translation [article]

Mikel Artetxe, Gorka Labaka, Eneko Agirre, Kyunghyun Cho
2018 arXiv   pre-print
In this work, we completely remove the need of parallel data and propose a novel method to train an NMT system in a completely unsupervised manner, relying on nothing but monolingual corpora.  ...  In spite of the recent success of neural machine translation (NMT) in standard benchmarks, the lack of large parallel corpora poses a major practical problem for many language pairs.  ...  Mikel Artetxe enjoys a doctoral grant from the Spanish MECD.  ... 
arXiv:1710.11041v2 fatcat:fhjnfqm5dbfgni52cumslnr4xy

Toward embedded development from Advanced Khoros [chapter]

Joe Fogler, Tom Robey, Mark Young
1998 Lecture Notes in Computer Science  
to embedded systems.  ...  Current practice in the design of application software for high-performance embedded c omputing systems is characterized by long development times, lack of interoperability with other systems, and handcrafting  ...  This e ort will look at building a framework that will allow exible algorithm selection for the mapping process and provide a few mapping algorithms.  ... 
doi:10.1007/3-540-64359-1_760 fatcat:hyb6quj6sjab7nyj4ltwsdzorq

Fast LBP Face Detection on Low-Power SIMD Architectures

Olexa Bilaniuk, Ehsan Fazl-Ersi, Robert Laganiere, Christina Xu, Daniel Laroche, Craig Moulder
2014 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops  
The implementation exploits parallelism and data reuse in the detection algorithm and is integrated into CogniVue's Gen-1 APEX platform, which uses a SIMD design and is extremely energy efficient.  ...  The proposed embedded face detection system runs at 5 VGA frames per second, while providing similar accuracy to the PC version of the LBP face detection algorithm included in the OpenCV library.  ...  Our main contribution in this paper is to propose an embedded implementation for Single Instruction Multiple Data (SIMD) architectures that exploits parallelism and data reuse in this face detection algorithm  ... 
doi:10.1109/cvprw.2014.96 dblp:conf/cvpr/BilaniukELXLM14 fatcat:jy4otxxj6fgbzhjwn3vfxurghm

The University of Maryland's Kazakh-English Neural Machine Translation System at WMT19

Eleftheria Briakou, Marine Carpuat
2019 Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)  
The submitted system improves over a Kazakh-only baseline by +5.45 BLEU on newstest2019.  ...  This paper describes the University of Maryland's submission to the WMT 2019 Kazakh to English news translation task.  ...  In this setting, an NMT system is firstly trained using auxiliary parallel data from a so-called "parent" language pair and then the trained model is used to initialize a "child" model which is further  ... 
doi:10.18653/v1/w19-5308 dblp:conf/wmt/BriakouC19 fatcat:itpugly4fzbelghcxea5r22tqm
« Previous Showing results 1 — 15 out of 103,442 results