A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2004; you can also visit the original URL.
The file type is application/pdf
.
Filters
A new look at exploiting data parallelism in embedded systems
2003
Proceedings of the international conference on Compilers, architectures and synthesis for embedded systems - CASES '03
# " 5 7 6 8 5 @ 9 DATA IN DATA OUT A C BDE F G DEF DEF ! # " % $ 1 & ( # 0 H $ ' I 2 P ! ...
H " Q 4 R T S 1 U V X W I Y Ì b a W 3 c I c dW f e g c I W DATA IN DATA OUT h 7 i q p r t s u w v I x I y ¤ v I I v C 3 v DATA IN DATA OUT ! H " G $ 1 & ( H 0 # $ ' I 2 4 ! # " ' C C ' C ' C C ' C ! ...
doi:10.1145/951732.951733
fatcat:zkqbizprmraujdqdukm2umipu4
A new look at exploiting data parallelism in embedded systems
2003
Proceedings of the international conference on Compilers, architectures and synthesis for embedded systems - CASES '03
# " 5 7 6 8 5 @ 9 DATA IN DATA OUT A C BDE F G DEF DEF ! # " % $ 1 & ( # 0 H $ ' I 2 P ! ...
H " Q 4 R T S 1 U V X W I Y Ì b a W 3 c I c dW f e g c I W DATA IN DATA OUT h 7 i q p r t s u w v I x I y ¤ v I I v C 3 v DATA IN DATA OUT ! H " G $ 1 & ( H 0 # $ ' I 2 4 ! # " ' C C ' C ' C C ' C ! ...
doi:10.1145/951710.951733
dblp:conf/cases/HunterM03
fatcat:bekucjkgzvg4llxhbpefjs6qka
Unsupervised Transfer Learning in Multilingual Neural Machine Translation with Cross-Lingual Word Embeddings
[article]
2021
arXiv
pre-print
In this work we look into adding a new language to a multilingual NMT system in an unsupervised fashion. ...
Under the utilization of pre-trained cross-lingual word embeddings we seek to exploit a language independent multilingual sentence representation to easily generalize to a new language. ...
In a monolingual data only setting we aim at achieving an equivalent result through cross-lingual word embeddings. ...
arXiv:2103.06689v1
fatcat:qbczgw62lvhvjfl3km5szhpmdm
Design and implementation of embedded multiprocessor architecture using FPGA
2010
2010 IEEE Symposium on Industrial Electronics and Applications (ISIEA)
We have therefore designed a new architecture called embedded concurrent computing (ECC), which is implemented on an FPGA chip using VHDL. ...
This paper proposes a design and implementation of embedded multiprocessors architecture system focusing on its design area and performance. ...
Acknowledgments The authors would like to thank the Underwater Robotics Research Group (URRG) in the USM for their assistance and NOD, MOSTI, for providing the research grant (Grant no. 6050124). ...
doi:10.1109/isiea.2010.5679397
fatcat:srjvmfyhrbafri5onb2gdmkpbq
Evaluating Embedded GPUs Performance via Computer Vision Applications
2020
International Journal of Computer Applications
The results show that, despite the architectural limitations, using such devices can lead to a speed-up of 8 times compared to traditional embedded systems processing data only on CPUs. ...
Despite the similarities with generalpurpose architectures that already exploit the benefits of GPUs, this new kind of embedded devices presents some architectural singularities, such as differences in ...
In this context, depending on the image sequence being processed, the Super Resolution algorithm may have to look for new frames to access several different positions in the system memory. ...
doi:10.5120/ijca2020920518
fatcat:s6u4poubzfbmpi4srzsz27qwey
RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance
[article]
2021
arXiv
pre-print
In particular, RPAccel processes queries in sub-batches to pipeline recommendation stages, implements dual static and dynamic embedding caches, a set of top-k filtering units, and a reconfigurable systolic ...
This paper presents RecPipe, a system to jointly optimize recommendation quality and inference performance. ...
First, GPUs exhibit low utilization when exploiting data-level parallelism in the frontend and model-level parallelism in the backend, primarily due to the high overhead of embedding lookups and memory ...
arXiv:2105.08820v2
fatcat:tsq6jygecvdo5l2bgs5pcbqbmu
Design space exploration for real-time embedded stream processors
2004
IEEE Micro
We present a framework for rapidly exploring the design space for stream processors in real-time embedded systems. ...
There is a trade-off between the number of arithmetic units in a cluster of a stream processor, the number of clusters and the clock frequency as each solution meets real-time at a different power consumption ...
Acknowledgements Sridhar Rajagopal and Joseph Cavallaro were supported in part by Nokia Corporation, Texas Instruments, Inc., and by NSF under grants EIA-0224458, and EIA-0321266. ...
doi:10.1109/mm.2004.25
fatcat:akwodtq6x5cqze2rj6kokllpti
Parallel Embedded Computing Architectures
[chapter]
2012
Embedded Systems - High Performance Systems, Applications and Projects
In the following, we will take a closer look at this special type of application. ...
In addition the presented memory management system can also be exploited for memory bound data parallel applications in HPC. ...
doi:10.5772/38478
fatcat:mwjbl67plnaxxemz4gfovwue2i
Exploring chip-multiprocessors in deeply-embedded real-time computing
2008
ACM SIGBED Review
In this essay, I propose the application of CMP in deeplyembedded real-time system. ...
As an energy efficient high-performance architecture, chip multiprocessor (CMP) can be deployed in deeply-embedded real-time computing. ...
Finally, the C2D-based testbed will be incorporated and tested in a mobile robot (as a deeply embedded real-time system) for data collection in a sensor network. ...
doi:10.1145/1366283.1366296
fatcat:p6f3gmhxrvfsncsd43ov6vk7ti
Simultaneous thin-thread processors for low-power embedded systems
2008
IEICE Electronics Express
In this paper, we investigate the possibility to use multi-threaded processors to solve the problems with the traditional superscalar processors in embedded systems. ...
A drawback is that the conventional design of the superscalar processors possesses inherent complexity and power problems which are not easily acceptable in the domain of embedded processors. ...
In this paper, we develop and propose a low-power multithreaded processor model for future embedded systems. ...
doi:10.1587/elex.5.802
fatcat:axoa4v4bfje63hogzswwg3shci
Implications of Electronics Technology Trends for Algorithm Design
2009
Computer journal
To assess the efficiency of an algorithm we will need to be able to predict data movement both in time and space. ...
Scaling of electronics technology has brought us to a pivotal point in the design of computational devices. ...
ACKNOWLEDGEMENTS This work was funded in part by a grant from EPSRC (EP/D036895) and a studentship from the Gates Cambridge Trust. ...
doi:10.1093/comjnl/bxp013
fatcat:dyhmwbw5ozcmblo7nbero6kxaq
Unsupervised Neural Machine Translation
[article]
2018
arXiv
pre-print
In this work, we completely remove the need of parallel data and propose a novel method to train an NMT system in a completely unsupervised manner, relying on nothing but monolingual corpora. ...
In spite of the recent success of neural machine translation (NMT) in standard benchmarks, the lack of large parallel corpora poses a major practical problem for many language pairs. ...
Mikel Artetxe enjoys a doctoral grant from the Spanish MECD. ...
arXiv:1710.11041v2
fatcat:fhjnfqm5dbfgni52cumslnr4xy
Toward embedded development from Advanced Khoros
[chapter]
1998
Lecture Notes in Computer Science
to embedded systems. ...
Current practice in the design of application software for high-performance embedded c omputing systems is characterized by long development times, lack of interoperability with other systems, and handcrafting ...
This e ort will look at building a framework that will allow exible algorithm selection for the mapping process and provide a few mapping algorithms. ...
doi:10.1007/3-540-64359-1_760
fatcat:hyb6quj6sjab7nyj4ltwsdzorq
Fast LBP Face Detection on Low-Power SIMD Architectures
2014
2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops
The implementation exploits parallelism and data reuse in the detection algorithm and is integrated into CogniVue's Gen-1 APEX platform, which uses a SIMD design and is extremely energy efficient. ...
The proposed embedded face detection system runs at 5 VGA frames per second, while providing similar accuracy to the PC version of the LBP face detection algorithm included in the OpenCV library. ...
Our main contribution in this paper is to propose an embedded implementation for Single Instruction Multiple Data (SIMD) architectures that exploits parallelism and data reuse in this face detection algorithm ...
doi:10.1109/cvprw.2014.96
dblp:conf/cvpr/BilaniukELXLM14
fatcat:jy4otxxj6fgbzhjwn3vfxurghm
The University of Maryland's Kazakh-English Neural Machine Translation System at WMT19
2019
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
The submitted system improves over a Kazakh-only baseline by +5.45 BLEU on newstest2019. ...
This paper describes the University of Maryland's submission to the WMT 2019 Kazakh to English news translation task. ...
In this setting, an NMT system is firstly trained using auxiliary parallel data from a so-called "parent" language pair and then the trained model is used to initialize a "child" model which is further ...
doi:10.18653/v1/w19-5308
dblp:conf/wmt/BriakouC19
fatcat:itpugly4fzbelghcxea5r22tqm
« Previous
Showing results 1 — 15 out of 103,442 results