Filters








19,548 Hits in 7.8 sec

Faster Support Vector Machines [article]

Sebastian Schlag, Matthias Schmitt, Christian Schulz
2020 arXiv   pre-print
For example, already one of our sequential solvers is on average a factor 15 faster than the parallel ThunderSVM algorithm, while having similar classification quality.  ...  We present a faster multilevel support vector machine that uses a label propagation algorithm to construct the problem hierarchy.  ...  For example, already one of our sequential solvers is on average a factor 15 faster than the parallel ThunderSVM algorithm, while having similar classification quality.  ... 
arXiv:1808.06394v3 fatcat:zvitszgos5ekllwbzm3o7rns6i

A Semicoarsening Multigrid Algorithm for SIMD Machines

J. E. Dendy, Jr., M. P. Ida, J. M. Rutledge
1992 SIAM Journal on Scientific and Statistical Computing  
The parallel efficiency of this method is analyzed, and its actual performance is compared with its performance on some other machines, both parallel and nonparallel.  ...  In the first version of this paper, we found that a version that saves the LU-decomposition ran two times faster on the CM-2 than a version that recomputes the LU-decomposition.  ...  ; the latter was about two times faster than the former.  ... 
doi:10.1137/0913082 fatcat:qowu53vnk5gufn2mcb3rcjn5le

PARALLEL MATRIX MULTIPLICATION ON THE CONNECTION MACHINE

WALTER F. TICHY
1989 International journal of high speed computing  
The Connection Machine (CM) is well suited for experimenting with large-scale parallelism.  ...  Matrix multiplication is a computation and communication intensive problem Six parallel algorithms for matrix multiplication on the Connection Machine are presented and compared with respect to their performance  ...  With the exception of the sequential algorithm, array dimensions are compiled into the programs. By using constants rather than variables, the programs run about 10 percent faster on the CM.  ... 
doi:10.1142/s0129053389000135 fatcat:kbysiudrifaw5moqyqed4ydjim

Patterns for cache optimizations on multi-processor machines

Nicholas Chen, Ralph Johnson
2010 Proceedings of the 2010 Workshop on Parallel Programming Patterns - ParaPLoP '10  
Writing parallel programs that run optimally on multi-processor machines is hard.  ...  Such cache issues are surprisingly common; neglecting them leads to poorly performing programs that do not scale well.  ...  Notice that at least 9 threads are needed for the OpenMP version to be faster than the sequential Loop Interchange version.  ... 
doi:10.1145/1953611.1953613 fatcat:qiqvt4vlprfpzkxghqecm3zo2q

Synthesis of Parallel Binary Machines [article]

Elena Dubrova
2011 arXiv   pre-print
Our experimental results show that for sequences with high linear complexity such as complementary, Legendre, or truly random, parallel binary machines are an order of magnitude smaller than parallel FSRs  ...  Binary machines are a generalization of Feedback Shift Registers (FSRs) in which both, feedback and feedforward, connections are allowed and no chain connection between the register stages is required.  ...  Again, on average, parallel binary machines are an order of magnitude smaller than parallel LFSRs and NLFSRs. VII.  ... 
arXiv:1105.4514v1 fatcat:hxuya62nqfgevdlgfj33qlyddi

Synergy in machines

Gerald S. Eisman
1983 Journal of Pure and Applied Algebra  
The capability of a finite state machine constructed of component machines in a composition with feedback is shown to be greater than the capabilities of series-parallel (or cascade) compositions of these  ...  A measure of the amount of feedback in a construction is defined and a hierarchy of classes of machines is obtained by increasing the amount of feedback permitted in the members of each class.  ...  There are three areas which need preliminary definitions. These are machines, semigroups, and the relation between machines and semigroups.  ... 
doi:10.1016/0022-4049(83)90087-7 fatcat:t3n2oz3d7zg4rof6l3nj4ohvzm

The M-machine multicomputer

Marco Fillo, Stephen W. Keckler, William J. Dally, Nicholas P. Carter, Andrew Chang, Yevgeny Gurevich, Whay S. Lee
1997 International journal of parallel programming  
The multiple function units are used to exploit both instruction-level and thread-level parallelism.  ...  The M-Machine computing nodes are connected with a 3-D mesh network; each node is a multithreaded processor incorporating 12 function units, on-chip cache, and local memory.  ...  as faster execution of fixed sized problems and easier programmability of parallel computers.  ... 
doi:10.1007/bf02700035 fatcat:b5utkjnigjhl5cp7ofxrhxpj7e

The M-Machine multicomputer

M. Fillo, S.W. Keckler, W.J. Dally, N.P. Carter, A. Chang, Y. Gurevich, W.S. Lee
1995 Proceedings of the 28th Annual International Symposium on Microarchitecture  
The multiple function units are used to exploit both instruction-level and thread-level parallelism.  ...  The M-Machine computing nodes are connected with a 3-D mesh network; each node is a multithreaded processor incorporating 12 function units, on-chip cache, and local memory.  ...  as faster execution of fixed sized problems and easier programmability of parallel computers.  ... 
doi:10.1109/micro.1995.476822 dblp:conf/micro/FilloKDCCGL95 fatcat:o523frqb7jdrjmj7vfprfk6kk4

Container-Based Cloud Virtual Machine Benchmarking

Blesson Varghese, Lawan Thamsuhang Subba, Long Thai, Adam Barker
2016 2016 IEEE International Conference on Cloud Engineering (IC2E)  
The proposed techniques are up to 91 times faster than a heavyweight technique which benchmarks the entire VM.  ...  It is observed that the first mode can generate ranks with over 90% and 86% accuracy for sequential and parallel execution of an application.  ...  The experiments highlight that the lightweight technique (a) is up to 91 times faster than the heavyweight technique, and (b) generates rankings with over 90% and 86% accuracy for sequential and parallel  ... 
doi:10.1109/ic2e.2016.28 dblp:conf/ic2e/VargheseSTB16 fatcat:6re5gotyava5hbhigq4rtwppiq

Generalized Core Vector Machines

I.W.H. Tsang, J.T.Y. Kwok, J.A. Zurada
2006 IEEE Transactions on Neural Networks  
Recently, by using approximation algorithms for the minimum enclosing ball (MEB) problem, we proposed the core vector machine (CVM) algorithm that is much faster and can handle much larger data sets than  ...  Kernel methods, such as the support vector machine (SVM), are often formulated as quadratic programming (QP) problems.  ...  ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers for their constructive comments on an earlier version of this paper.  ... 
doi:10.1109/tnn.2006.878123 pmid:17001975 fatcat:syoc3vvwb5h2bmho7rxzrv46vi

:{unav)

Alan Fern, Robert Givan
2012 Machine Learning  
Our results evaluate these methods on both our branch prediction domain and online variants of three familiar machine-learning benchmarks. Our data justifies three key claims.  ...  In particular, we consider (parallel) time and space-efficient ensemble learners for online settings, empirically demonstrating benefits similar to those shown previously for offline ensembles.  ...  Thus, hypotheses that are more accurate will tend to have larger voting weights. Note that the offline version of Arc-x4 uses a majority rather than a weighted vote.  ... 
doi:10.1023/a:1025619426553 fatcat:bvsxtvy3t5hubgqugsxgmbnrey

Direct Search Methods on Parallel Machines

J. E. Dennis, Jr., Virginia Torczon
1991 SIAM Journal on Optimization  
This paper describes an approach to constructing derivative-free algorithms for unconstrained optimization that are easy to implement on parallel machines.  ...  Thus search strategies intended for many processors actually may generate algorithms that are better even when implemented sequentially.  ...  We are grateful to the referees for their useful comments. We thank Robert Michael Lewis for his valuable suggestions on how best to present this material, particularly the results given in § 7.  ... 
doi:10.1137/0801027 fatcat:ml7fp3ehcrgbtihzsi3cfujaju

Marius++: Large-Scale Training of Graph Neural Networks on a Single Machine [article]

Roger Waleffe, Jason Mohoney, Theodoros Rekatsinas, Shivaram Venkataraman
2022 arXiv   pre-print
than these systems when they are using up to eight GPUs.  ...  We show that out-of-core pipelined mini-batch training in a single machine outperforms resource-hungry multi-GPU solutions.  ...  In terms of runtime, Sequential is always faster than Dispersed due to the elimination of intra-epoch IO.  ... 
arXiv:2202.02365v1 fatcat:q72konmnu5dk7jg2ktgmim5koe

Structure-Aware Dynamic Scheduler for Parallel Machine Learning [article]

Seunghak Lee, Jin Kyu Kim, Qirong Ho, Garth A. Gibson, Eric P. Xing
2013 arXiv   pre-print
Training large machine learning (ML) models with many variables or parameters can take a long time if one employs sequential procedures even with stochastic updates.  ...  divergence, because dependencies between model elements can attenuate the computational gains from parallelization and compromise correctness of inference.  ...  In all cases, STRADS converged much faster than the other two schedulers.  ... 
arXiv:1312.5766v2 fatcat:gapbbhpqarggbkguzcksk47qka

:{unav)

Michael Collins, Robert E. Schapire, Yoram Singer
2012 Machine Learning  
These algorithms are iterative and can be divided into two types based on whether the parameters are updated sequentially (one at a time) or in parallel (all at once).  ...  We also describe a parameterized family of algorithms that includes both a sequential-and a parallel-update algorithm as special cases, thus showing how the sequential and parallel approaches can themselves  ...  Our hope is that in some situations this parallel-update algorithm will be faster than the sequential-update algorithm. See Section 11 for preliminary experiments in this regard.  ... 
doi:10.1023/a:1013912006537 fatcat:3wplwzti3jdgbjmzkptbh2srru
« Previous Showing results 1 — 15 out of 19,548 results