Filters








16,458 Hits in 6.1 sec

Data Path Management in Mesh-Based Programmable Routers

Q. Wu, T. Wolf
2010 2010 IEEE International Conference on Communications  
Although such packet processors are rich in raw system processing power, utilization of hardware resource plays a critical role in overall system performance.  ...  We present a genetic algorithm to explore the assignment of tasks, and utilize on-chip interconnections by splitting the traffic between cores across multiple paths.  ...  Key characters of high throughput communication in such a network include the ability to avoid deadlock and traffic splitting. A.  ... 
doi:10.1109/icc.2010.5501808 dblp:conf/icc/WuW10 fatcat:bbtrct6byrbx5bvnayqqtfh6o4

Performance Analysis for MPEG-4 Video Codec Based on On-Chip Network

June-Young Chang, Won-Jong Kim, Young-Hwan Bae, Jin Ho Han, Han-Jin Cho, Hee-Bum Jung
2005 ETRI Journal  
The existing on-chip buses of system-on-a-chip (SoC) have some limitation on data traffic bandwidth since a large number of silicon IPs share the bus.  ...  In this paper, we present a performance analysis for an MPEG-4 video codec based on the on-chip network communication architecture.  ...  Introduction As system-on-a-chip (SoC) grows in design complexity, data traffic of IP cores becomes more and more important.  ... 
doi:10.4218/etrij.05.0905.0025 fatcat:ury2tcg6sje75eoy2iw257bgfa

Exploiting programmable network interfaces for parallel query execution in workstation clusters

V.S. Kumar, M.J. Thazhuthaveetil, R. Govindarajan
2006 Proceedings 20th IEEE International Parallel & Distributed Processing Symposium  
In this paper, we propose schemes where certain application level processing in parallel database query execution is performed on the network processor.  ...  We, therefore, assume such a userlevel communication layer in our system [7] .  ...  Acknowledgments This work was partially supported by a research grant from IBM India Research Center. We would like to thank Prof. W. M. Zuberek for the CNET simulation software.  ... 
doi:10.1109/ipdps.2006.1639314 dblp:conf/ipps/KumarTG06 fatcat:5cbvrfqznrambextdey2uytdpq

A scheduling system for exploiting data and task parallelism on PC laboratory clusters

Ying‐Nan Chen, Li‐Ming Tseng, Yi‐Ming Chen
2003 Campus-Wide Information Systems  
selection, and data partition functions for data and task parallel programs.  ...  The work in this paper presents a framework for deciding on a good execution strategy for a given program based on the available data and task parallelism in the program on PC laboratory clusters.  ...  For verifying the impact of load imbalance and network traffic, we select 10 processors to generate network traffic and 22 processors to execute tasks.  ... 
doi:10.1108/10650740310455559 fatcat:oupgr7wrvfdp3ccgqzvtu44o44

An Optimized Network-on-Chip Design for Data Parallel FFT1

Thomas Canhao Xu, Pasi Liljeberg, Hannu Tenhunen
2012 Procedia Engineering  
However, the evaluation of data parallel FFT in a NoC platform has not been well addressed. We analyse data parallel FFT in terms of traffic patterns and propose an optimized NoC design.  ...  In this paper, we propose an optimized Network-on-Chip (NoC) design for data parallel FFT applications. NoC based architecture is proposed for future multicore processors due to its scalability.  ...  A famous algorithm is the split-radix FFT, which achieves the lowest arithmetic operation count [10] . Implementing FFTs on multi-processor systems has been studied in [11] and [12] .  ... 
doi:10.1016/j.proeng.2012.01.866 fatcat:qwfjpzguyfhxlfq2rm5bh7pnum

Implementation of Cache Fair Thread Scheduling for multi core processors using wait free data structures in cloud computing applications

A.S. Radhamani, E. Baburaj
2011 2011 World Congress on Information and Communication Technologies  
In this paper an effective scheduling framework for multi-core processors that strike a balance between control over the system and an effective network traffic control mechanism for high-performance computing  ...  The primary goal of scheduling framework is to improve application throughput and overall system utilization in cloud applications.  ...  In this paper, a network traffic monitoring and control for multi core processors based on cloud is proposed.  ... 
doi:10.1109/wict.2011.6141313 fatcat:awglbzxx3bay3fn5d7atbof3lm

Network Interface Sharing Techniques for Area Optimized NoC Architectures

Alberto Ferrante, Simone Medardoni, Davide Bertozzi
2008 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools  
While area improvements are significant, a number of physical and system-level effects might mitigate performance degradation, making our technique a promising solution for area efficient network-on-chip  ...  A common approach to relieve the problem consists of sharing most of network interface resources among a number of processor cores.  ...  The workload is assumed to be split among 8 parallel processor cores by means of an horizontal slicing technique.  ... 
doi:10.1109/dsd.2008.111 dblp:conf/dsd/FerranteMB08 fatcat:df4dzlfv3ffb5cnltnp23lruty

A Migratory Near Memory Processing Architecture Applied to Big Data Problems [article]

Ed T. Upchurch
2020 arXiv   pre-print
To perform a database query, data must be moved back and forth between DRAM and a small cache as well as between DRAM and disks. For Big Data applications this data movement in onerous.  ...  Servers produced by mainstream vendors are inefficient in processing Big Data queries due to bottlenecks inherent in the fundamental architecture of these systems.  ...  A parallel hash partitioned join splits the join into p smaller joins where p is the degree of parallelism based on the number of independent cores or MNMS memory units in the case of MNMS.  ... 
arXiv:2003.09074v1 fatcat:jeoywijlqzgmdcfs5zycthxcja

Implementation and analysis of forward error correction decoding for Cloud-RAN systems

Henning Paul, Dirk Wubben, Peter Rost
2015 2015 IEEE International Conference on Communication Workshop (ICCW)  
In future 5G mobile networks, radio access network functions will be virtualized and implemented on centralized cloud platforms.  ...  In principle, this allows for more advanced algorithms of joint processing and offers the ability to balance the computational load.  ...  The authors would like to acknowledge the contributions of their colleagues in iJOIN, although the views expressed are those of the authors and do not necessarily represent the project.  ... 
doi:10.1109/iccw.2015.7247588 dblp:conf/icc/PaulWR15 fatcat:pghri22kjzah5dljekk26uccsa

Small high-bandwidth ATM switch

Adisak Mekkittikul, Nick McKeown, Martin Izzard, Wai Sum Lai, Sam T. Jewell, Curtis A. Siller, Jr., Indra Widjaja, Dennis Karvelas
1996 Broadband Access Systems  
The Tiny Tera efficiently supports both unicast and multicast traffic.  ...  Because of limitations in memory and interconnection bandwidths, we believe that to achieve such a high-bandwidth switch requires an innovative architecture.  ...  Cells removed from the output queues leave the system over the external parallel interface.  ... 
doi:10.1117/12.257346 fatcat:wrrbupilazaabhijdyaddqeaja

The Impact of Higher Communication Layers on NoC Supported MP-SoCs

T. Marescaux, E. Brockmeyer, H. Corporaal
2007 First International Symposium on Networks-on-Chip (NOCS'07)  
Multi-processor systems-on-chip use networks-onchip (NoC) as a communication backbone to tackle the communication between processors and multi-level memory hierarchies.  ...  Inter-processor communication has a high impact on the NoC traffic but, to this day, there have been few detailed studies.  ...  Vanmeerbeeck and P. Avasare to the NoCTurn MP-SoC platform simulator and of R. Baert to the application mapping.  ... 
doi:10.1109/nocs.2007.41 dblp:conf/nocs/MarescauxBC07 fatcat:ywttd7pwzbhlnmtx76bzzhmzoi

FPL-3: Towards Language Support for Distributed Packet Processing [chapter]

Mihai-Lucian Cristea, Willem de Bruijn, Herbert Bos
2005 Lecture Notes in Computer Science  
FPL-3 supports not only generic headerbased filtering, but also more demanding tasks, such as payload scanning, packet replication and traffic splitting.  ...  The proposed framework can be used to execute such diverse tasks as load balancing, traffic monitoring, firewalling and intrusion detection directly at the critical highbandwidth links (e.g., in enterprise  ...  Acknowledgements This work was supported by the EU SCAMPI project IST-2001-32404, and the EU LOBSTER project, while Intel donated the network cards.  ... 
doi:10.1007/11422778_60 fatcat:6ombsl5u2rbdhfg2u75avkttjy

Grid-enabled parallel divide-and-conquer

Chun-Hsi Huang
2002 Proceedings of the 2002 ACM symposium on Applied computing - SAC '02  
These include a distributed-memory system: a Solaris cluster of 32 Sun Ultra5 computers on Myrinet network and a distributed shared-memory system: an SGI Origin 2000 with 32 R10000 processors, using MPICH-GM  ...  The algorithm is communication-free in the conquer stage and uses only a small amount of messages while partitioning the input.  ...  In order to split the tournaments as evenly as possible in the partitioning stage, in each round, the mediocre player whose in-degree and out-degree are closest is selected and used to split the tournaments  ... 
doi:10.1145/508950.508959 fatcat:joioumnpwngwxi6hpf53g5buum

Grid-enabled parallel divide-and-conquer

Chun-Hsi Huang
2002 Proceedings of the 2002 ACM symposium on Applied computing - SAC '02  
These include a distributed-memory system: a Solaris cluster of 32 Sun Ultra5 computers on Myrinet network and a distributed shared-memory system: an SGI Origin 2000 with 32 R10000 processors, using MPICH-GM  ...  The algorithm is communication-free in the conquer stage and uses only a small amount of messages while partitioning the input.  ...  In order to split the tournaments as evenly as possible in the partitioning stage, in each round, the mediocre player whose in-degree and out-degree are closest is selected and used to split the tournaments  ... 
doi:10.1145/508791.508959 dblp:conf/sac/Huang02 fatcat:cpogpktzfbbwxnqfwcsa3di4y4

Parallel execution of logic programs by load sharing

Zheng Lin
1997 The Journal of Logic Programming  
As the scale of a nmltiprocessor system grows, and the speed of implementing resolution in local processors improves, task scheduling becomes increasingly frequent.  ...  It is therefore of growing importance to search for methods that are less reliant on resources subject to competition by processors in a parallel computer.  ...  This will become clear when performance data are presented from a typical Or-parallel system later in this paper.  ... 
doi:10.1016/s0743-1066(96)00014-3 fatcat:u5qsrq33unbwzhr4t5lqlnl4km
« Previous Showing results 1 — 15 out of 16,458 results