28,526 Hits in 4.4 sec

Multi-platform Auto-vectorization

D. Nuzman, R. Henderson
International Symposium on Code Generation and Optimization (CGO'06)  
code to vector code by the compiler.  ...  Intrinsics vector float vb = vec_load (0, ptr_b); vector float vc = vec_load (0, ptr_c); vector float va = vec_add (vb, vc); vec_store (va, 0, ptr_a); Autovectorization: Automatically transform serial  ...  IBM Labs in HaifaMulti-Platform Auto-Vectorization -Talk LayoutLarsen,Amarasinghe ; Shin,Chame,Hall) -Altivec Vectorizing compilers available for multiple SIMD targets source-to-source compilers Vienna  ... 
doi:10.1109/cgo.2006.25 dblp:conf/cgo/NuzmanH06 fatcat:gwcjkpxr6ffcjkyre3ntqr2ax4

Multi-sensor kernel design for time-frequency analysis of sparsely sampled nonstationary signals

Yimin D. Zhang, Liang Guo, Qisong Wu, Moeness G. Amin
2015 2015 IEEE Radar Conference (RadarCon)  
In this paper, we examine the sparsity-based timefrequency signal representation (TFSR) of randomly thinned nonstationary signals in a multi-sensor platform to yield improved performance with reduced number  ...  We develop a robust multi-sensor AOK design based on data fusion across all sensors so as to enhance the signal auto-terms while effectively mitigating artifacts, cross-terms, and noise.  ...  While CSbased TF approaches were considered for a single-sensor scenario, we extend such treatment into a multi-sensor platform.  ... 
doi:10.1109/radar.2015.7131122 fatcat:gowplr5m6rbd5dxfbeouwcv77q

Multi-tier Service Differentiation: Coordinated Resource Provisioning and Admission Control

Sireesha Muppala, Xiaobo Zhou, Guihai Chen
2012 2012 IEEE 18th International Conference on Parallel and Distributed Systems  
We propose a coordinated self-adaptive resource management and admission control for multi-tier Internet service differentiation and performance improvement in a shared virtualized platform.  ...  We implement the integrated approach in a virtualized blade server system hosting multi-tier RUBiS applications.  ...  the shared platform to the R M×N vector so that Eq. (3) and Eq. (4) are satisfied at the same time.  ... 
doi:10.1109/icpads.2012.20 dblp:conf/icpads/MuppalaZC12 fatcat:bqppzzqxxrbrfkpkqsb3l3dk4e

DOA estimation of sparsely sampled nonstationary signals

Liang Guo, Yimin D. Zhang, Qisong Wu, Moeness G. Amin
2015 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP)  
The paper deals with sparsely sampled nonstationary signals in a multi-sensor array platform.  ...  The reconstructed auto-and cross-sensor TFSRs enable the formation of the spatial time-frequency distribution (STFD) matrix, which is used, in turn, to propose the sparse time-frequency MUSIC (STF-MUSIC  ...  The averaged AF over all sensors is given by A Σ (r, ψ) = 1 N N q=1 A q (r, ψ). (11) Then, an improved kernel in the multi-sensor platform is obtained by replacing A(r, ψ) in (10) by A Σ (r, ψ) in (11  ... 
doi:10.1109/chinasip.2015.7230412 dblp:conf/chinasip/GuoZWA15 fatcat:cjlup4djizhe7fejczqlshwb7y

On the performance and energy-efficiency of multi-core SIMD CPUs and CUDA-enabled GPUs

Ronald Duarte, Resit Sendag, Frederick J. Vetter
2013 2013 IEEE International Symposium on Workload Characterization (IISWC)  
This paper explores the performance and energy efficiency of CUDA-enabled GPUs and multi-core SIMD CPUs using a set of kernels and full applications.  ...  Our implementations efficiently exploit both SIMD and thread-level parallelism on multi-core CPUs and the computational capabilities of CUDA-enabled GPUs.  ...  C: compiler auto-vectorized, only accounts for single-threaded optimization effort. **: Multi-core effort only.  ... 
doi:10.1109/iiswc.2013.6704683 dblp:conf/iiswc/DuarteSV13 fatcat:qs36ks5kezdrldp5zk3nstai2i

Architectural Support for Reducing Parallel Processing Overhead in an Embedded Multiprocessor

Jian Wang, Joar Sohl, Dake Liu
2010 2010 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing  
The host-multi-SIMD chip multiprocessor (CMP) architecture has been proved to be an efficient architecture for high performance signal processing which explores both task level parallelism by multi-core  ...  Implementing an algorithm in a parallel platform usually produces control and communication overhead which is not parallelizable.  ...  The ePUMA platform uses the host-multi-SIMD with architectural optimizations to minimize parallel processing overheads.  ... 
doi:10.1109/euc.2010.17 dblp:conf/euc/WangSL10 fatcat:uboiys3jzncufgqeana4pn7lxy

iDev: Enhancing Social Coding Security by Cross-platform User Identification Between GitHub and Stack Overflow

Yujie Fan, Yiming Zhang, Shifu Hou, Lingwei Chen, Yanfang Ye, Chuan Shi, Liang Zhao, Shouhuai Xu
2019 Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence  
Then, we propose a novel AHIN representation learning model AHIN2Vec to efficiently learn node (i.e., user) representations in AHIN for cross-platform user identification.  ...  To solve this problem, an important insight brought by this work is to leverage social coding properties in addition to user attributes for cross-platform user identification.  ...  Multi-view network built from AHIN.  ... 
doi:10.24963/ijcai.2019/315 dblp:conf/ijcai/FanZHCYSZX19 fatcat:ry6it7fhs5hzzhszfaybmdk77a

Towards A Multi-agent System for Online Hate Speech Detection [article]

Gaurav Sahu, Robin Cohen, Olga Vechtomova
2021 arXiv   pre-print
This paper envisions a multi-agent system for detecting the presence of hate speech in online social media platforms such as Twitter and Facebook.  ...  We conclude with a discussion of how our system may be of use to provide recommendations to users who are managing online social networks, showcasing the immense potential of intelligent multi-agent systems  ...  text none 45.18 33.4 38.41 BiL multi text+caption none 45.38 33.67 38.67 VBiL multi image+text+caption Concat 55.27 35.54 43.04 VBiL multi image+text+caption Auto-Fusion 59.65 43.87  ... 
arXiv:2105.01129v1 fatcat:lakkm66thrfy3kidfc734cfjbe

Vectorization of Riemann solvers for the single- and multi-layer Shallow Water Equations

Chaulio R. Ferreira, Kyle T. Mandli, Michael Bader
2018 2018 International Conference on High Performance Computing & Simulation (HPCS)  
We discuss vectorization of normal and transverse Riemann solvers for the single-and multi-layer shallow water equations.  ...  Our approach is simple and portable, as it is based on auto-vectorization by the compiler, aided by OpenMP 4.0 directives.  ...  Although auto-vectorization was possible for their f -Wave solver, intrinsics functions were necessary to achieve vectorization of the augmented Riemann solver, because the compiler was not able to auto-vectorize  ... 
doi:10.1109/hpcs.2018.00073 dblp:conf/ieeehpcs/FerreiraMB18 fatcat:apwymwd64fefvbcdziuolsok7i

Use of SIMD Vector Operations to Accelerate Application Code Performance on Low-Powered ARM and Intel Platforms

Gaurav Mitra, Beau Johnston, Alistair P. Rendell, Eric McCreath, Jun Zhou
2013 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum  
On the ARM platforms the hand-tuned NEON benchmarks were between 1.05× and 13.88× faster than the auto-vectorized code, while for the Intel platforms the hand-tuned SSE benchmarks were between 1.34× and  ...  The performance obtained using compiler auto-vectorization is compared with that achieved using hand-tuning across a range of five different benchmarks and ten different hardware platforms.  ...  These figures show that in general the benefit of using hand coded SIMD intrinsics over auto-vectorization appears to be slightly greater on the Intel platforms compared with the ARM platforms.  ... 
doi:10.1109/ipdpsw.2013.207 dblp:conf/ipps/MitraJRMZ13 fatcat:43t6svygpzefdobbxwy4gsjad4

Special issue on Intelligence Computation Evolutionary Computation: ICEV2018

Zhenyu Du
2019 Evolutionary Intelligence  
Imbalanced data classification algorithm with support vector machine kernel extensions proposes a imbalanced data classification algorithm of support vector machines (KE-SVM).  ...  maximum margin classification SVM model, and then obtaining a new kernel extension function. based on Chi square test and weight coefficient calculation, through training the samples again by the new vector  ...  Firstly, the method input historical data which contains power load, weather information, and holiday information, and use auto-encoding to compress the historical data; and then, the multi-layer GRU is  ... 
doi:10.1007/s12065-019-00271-0 fatcat:s454clzxjnejfbzindrypt7nsa

Fusion OLAP: Fusing the Pros of MOLAP and ROLAP Together for In-memory OLAP

Yansong Zhang, Yu Zhang, Shan Wang, Jiaheng Lu
2018 IEEE Transactions on Knowledge and Data Engineering  
The Fusion OLAP model can be integrated into the state-of-the-art in-memory databases with additional surrogate key indexes and vector indexes.  ...  This is achieved by mapping the relation tables into virtual multidimensional model and binding the multidimensional operations into a set of vector indexes to enable multidimensional computing on relation  ...  The vector access latency can be improved by two roadmaps, by cache locality or by simultaneous multi-threading.  ... 
doi:10.1109/tkde.2018.2867522 fatcat:vfrtcmiqsvfodeahtx6oake2uu

Evaluating Auto-Vectorizing Compilers through Objective Withdrawal of Useful Information

Sergi Siso, Wes Armour, Jeyarajan Thiyagalingam
2019 ACM Transactions on Architecture and Code Optimization (TACO)  
With our new method in place, we exhaustively evaluated five industry-grade compilers: GNU, Intel, Clang, PGI and IBM; on four representative vector platforms: AVX-2, AVX-512 (Skylake), AVX-512 (KNL) and  ...  a method to objectively supply and withdraw information that would otherwise aid the compiler in the auto-vectorization process.  ...  (f) Global Data Flow and Symbolics categories show good auto-vectorization results across all platforms and compilers.  ... 
doi:10.1145/3356842 fatcat:iztjlyb7lffvrehu4mcx3dgroy

Pushing the Limits of Online Auto-tuning: Machine Code Optimization in Short-Running Kernels [article]

Fernando Endo and Damien Couroussé and Henri-Pierre Charles
2017 arXiv   pre-print
We propose an online auto-tuning approach for computing kernels.  ...  This allows auto-tuning to pay off in very short-running applications.  ...  and the best statically auto-tuned kernels, in the real platforms (all run-time overheads included).  ... 
arXiv:1707.04566v1 fatcat:sdtgqm6iv5ekzmxnuxtuvveisq

Exploring source-to-source compiler transformation of OpenMP SIMD constructs for Intel AVX and Arm SVE vector architectures

Patrick Flynn, Xinyao Yi, Yonghong Yan
2022 Proceedings of the Thirteenth International Workshop on Programming Models and Applications for Multicores and Manycores  
Finally, we conduct performance evaluations on Intel AVX and Arm SVE to demonstrate how this method of vectorization can bridge the gap between auto-and manual-vectorization.  ...  We present the design of a unified IR that is easily translated to AVX and SVE vector architectures.  ...  GPUs and multi-core CPUs (via threading).  ... 
doi:10.1145/3528425.3529100 fatcat:b6zh5b3gfvcndancatw4lunu6q
« Previous Showing results 1 — 15 out of 28,526 results