Filters








21 Hits in 7.2 sec

Exploiting Parallel R in the Cloud with SPRINT

M. Piotrowski, G. A. McGilvary, T. M. Sloan, M. Mewissen, A. D. Lloyd, T. Forster, L. Mitchell, P. Ghazal, J. Hill
2012 Methods of Information in Medicine  
Methods-The SPRINT parallel implementations of correlation, permutation testing, partitioning around medoids and the multi-purpose papply have been benchmarked on data sets of various size on Amazon EC2  ...  High Performance Computing (HPC) in the Cloud offers an affordable way of meeting this need.  ...  Sornthep Vannarat, Head of the Large Scale Simulation Laboratory from the National Electronics and Computer Technology Center when performing the experiments in Thailand.  ... 
doi:10.3414/me11-02-0039 pmid:23223611 pmcid:PMC3547073 fatcat:hb4fbgcfvne23gydfhyyn6zqvq

Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering

Lerato Lerato, Thomas Niesler, Xi-Nian Zuo
2015 PLoS ONE  
The algorithm does not require the number of clusters to be specified in advance, and is shown to be comparable in performance to parallel spectral clustering (PSC) [18] which has also been developed specifically  ...  These segments are time series of varying length and are not easily represented as vectors in a vector space. However, they can be compared and similarities can be computed.  ...  Acknowledgments Computations were performed using the University of Stellenbosch's Rhasatsha HPC: http:// www.sun.ac.za/hpc. Author Contributions  ... 
doi:10.1371/journal.pone.0141756 pmid:26517376 pmcid:PMC4627777 fatcat:jdhmz6krkrdzfhizbcv3i5cwpy

A framework for machine-learning-augmented multiscale atomistic simulations on parallel supercomputers

Marco Caccin, Zhenwei Li, James R. Kermode, Alessandro De Vita
2015 International Journal of Quantum Chemistry  
We argue that the ML approach allows computational effort to be concentrated on the most chemically active subregions of the QM zone, significantly improving the overall efficiency of the simulation.  ...  We thus propose a novel method to partition large QM regions into multiple subregions which can be computed in parallel to achieve optimal scaling.  ...  the sizes and optimise the shapes of C i .  ... 
doi:10.1002/qua.24952 fatcat:hkr5aoh47beynbqo65isf7ydwq

SingleCAnalyzer: Interactive Analysis of Single Cell RNA-Seq Data on the Cloud

Carlos Prieto, David Barrios, Angela Villaverde
2022 Frontiers in Bioinformatics  
prediction, unsupervised clustering of cells, pseudotime/trajectory analysis, expression comparisons between groups, functional enrichment of differentially expressed genes and gene set expression analysis  ...  However, the proper application of these computational methods requires extensive bioinformatics expertise. Otherwise, it is often difficult to obtain reliable and reproducible results.  ...  At present, SingleCAnalyzer applies the following unsupervised clustering methods: 1) k-means, which is computed using the kmeans function of the stats R package (with iter_max = 15); 2) partition around  ... 
doi:10.3389/fbinf.2022.793309 fatcat:d2imgql2hbcaldivak5kk23khy

Extending the SACOC algorithm through the Nyström method for dense manifold data analysis

Héctor D. Menéndez, Fernando E.B. Otero, David Camacho
2017 International Journal of Bio-Inspired Computation (IJBIC)  
Spectral-based methods, which are one of the main used methodologies in this area, are sensitive to metric parameters and noise.  ...  We evaluated the performance of the proposed approach, called SACON, comparing it against online clustering algorithms and the Nyström extension of the Spectral Clustering algorithm using several benchmark  ...  Acknowledgement This work has been supported by the research projects: TIN2014-56494-C4-4-P, CIBERDINE S2013/ICE-3095, SeMaMatch EP/K032623/1 and Airbus Defence & Space (FUAM-076914 and FUAM-076915).  ... 
doi:10.1504/ijbic.2017.085894 fatcat:6n7tg5lqb5e2bfeeyopa52bl44

Supporting Autonomic Management of Clouds: Service Clustering With Random Forest

Rafael Brundo Uriarte, Francesco Tiezzi, Sotirios A. Tsaftaris
2016 IEEE Transactions on Network and Service Management  
of the cloud domain including the sensitivity of the services to hardware configurations and priorities.  ...  As future works, we will investigate the characteristics of RF, such as variable importance and feature selection, to improve our methodology.  ...  To cluster the observations, the dissimilarity matrix is passed as input to a compatible clustering algorithm, for example, the Partitioning Around Medoids (PAM) [6] .  ... 
doi:10.1109/tnsm.2016.2569000 fatcat:mbcon775wvcqdkjqgn4giz2d5y

Modelling and Recognition of Protein Contact Networks by Multiple Kernel Learning and Dissimilarity Representations

Alessio Martino, Enrico De Santis, Alessandro Giuliani, Antonello Rizzi
2020 Entropy  
The core of the training procedure is the joint optimisation of kernel weights and representatives selection in the dissimilarity spaces.  ...  Computational results show remarkable classification capabilities and the knowledge discovery analysis is in line with current biological knowledge, suggesting the reliability of the proposed system.  ...  The choice behind a genetic algorithm stems from them being widely famous in the context of derivative-free optimisation, embarrassingly easy to parallelise and for the sake of consistency with competing  ... 
doi:10.3390/e22070794 pmid:33286565 pmcid:PMC7517365 fatcat:w66fbfzgjncydlg5fmjcsxffdm

Unsupervised extraction of stable expression signatures from public compendia with eADAGE [article]

Jie Tan, Georgia Doing, Kimberley A Lewis, Courtney E Price, Kathleen M Chen, Kyle C Cady, Barret Perchuk, Michael T Laub, Deborah A Hogan, Casey S Greene
2016 bioRxiv   pre-print
While we expected PhoB activity in limiting phosphate conditions, our analyses found PhoB activity in other media with moderate phosphate and predicted that a second stimulus provided by the sensor kinase  ...  Cross experiment comparisons in public data compendia are challenged by unmatched conditions and technical noise.  ...  ., and Hill, J. (2011). Optimisation and parallelisation of the partitioning 524 around medoids function in R, in: 2011 International Conference on High Performance 525 Computing & Simulation.  ... 
doi:10.1101/078659 fatcat:mze4x6sikfbxndhrsduv7eh3ma

Large-scale data mining analytics based on MapReduce [article]

Sunny Ranjan, Universität Stuttgart, Universität Stuttgart
2015
This version is called YARN and is very flexible in supporting various kinds of distributed data processing other than batchmode processing of MapReduce.  ...  Apache Spark claimed that it is much faster than Hadoop MapReduce as it exploits the advantages of in-memory computations which is particularly more beneficial for iterative workloads in case of data mining  ...  (medoids) and assigns them to the closest medoids. • Reduce function uses medoids centers as the key and calculates the new medoids using the optimal search of medoids. • If the medoids have changes,  ... 
doi:10.18419/opus-3493 fatcat:aa7hzahdtjbprf4ebxzxj6bba4

Space-Time Window Reconstruction In Parallel High Performance Numeric Simulations. Application For Cfd (Phd Thesis)

Alin Anton, Ioan Cretu
2011 Zenodo  
The size of the output originating from large scale, numerical simulations poses major bottlenecks in high performance, parallel computing.  ...  Recently it became more and more evident that a radical change has to take place in the way scientists and engineers handle numerical simulations.  ...  Both partitioning around medoids [29] Most of the papers have been selected from conference proceedings and journals published by the Association for Computing Machinery (ACM) and the Institute of Electrical  ... 
doi:10.5281/zenodo.15938 fatcat:xh4i6ig7qvfkjapq76phiadpje

A review of urban energy system models: Approaches, challenges and opportunities

James Keirstead, Mark Jennings, Aruna Sivakumar
2012 Renewable & Sustainable Energy Reviews  
However such a broad topic inevitably results in number of alternative interpretations of the problem domain and the modelling tools used in its study.  ...  We also highlight a sixth field, land use and transportation modelling, which has direct relevance to the use of energy in cities but has been somewhat overlooked by the literature to date.  ...  Acknowledgments The financial support of BP via the Urban Energy Systems project at Imperial College London (www.imperial.ac.uk/urbanenergysystems) is gratefully acknowledged.  ... 
doi:10.1016/j.rser.2012.02.047 fatcat:qzlxzdptcrcjbahjfg74wqquxe

Discovering latent topical structure by second-order similarity analysis

Timothy Cribbin
2011 Journal of the American Society for Information Science and Technology  
and sparseness of the vector space, in combination with the effects of feature independence and vocabulary mismatch conspire to limit the structural validity of computed similarities.  ...  In the next section, the motivation for this work is explained in greater detail along with formal definitions of the second-order similarity and other methods applied in this study.  ...  Continuity, in contrast, is derived from the recall function.  ... 
doi:10.1002/asi.21519 fatcat:xx2dyggapnb5zbbtp7slfrwqeu

Efficient clustering techniques for big data

Sami Al Ghamdi
2018
Many studies introduced a parallel implementation of Lloyd's K-Means on Hadoop in order to improve the algorithm's scalability.  ...  The majority of the running time in the original K-Means algorithm (known as Lloyd's algorithm) is spent on computing distances from each data point to all cluster centres to find the closest centre to  ...  Clustering Large Applications based upon RANdomized Search (CLARANS) [35] extends the clustering algorithm called K-Medoid, which cluster data points around medoids instead of cluster centroids as in  ... 
doi:10.48683/1926.00084926 fatcat:svndi2iqbjgk3nf7vj5lti7suy

Sparse multivariate models for pattern detection in high-dimensional biological data

Zi Wang, Giovanni Montana, Biotechnology And Biological Sciences Research Council (Great Britain)
2015
In NsRRR, pairwise relatedness between predictors and between responses are represented by two networks, and the model identifies associations between a subnetwork of predictors and a subnetwork of responses  ...  Parsimonious models, which may refer to parsimony in model structure and/or model parameters, have been shown to improve both biological interpretability of the model and the generalisability to new data  ...  When Q > 1, the constrained region is a strictly convex set and the optimisation problem in (1.2) and (2.1) is convex optimisation for both squared and logistic loss functions.  ... 
doi:10.25560/25762 fatcat:6c2xsoazlnbpjoamcsdw4cb6pu

Understanding High Dimensional Visual Data

BENJAMIN JOHN HARWOOD
2019
This thesis has focused on the advancement of these computer vision systems.  ...  The algorithms produced by this research have established new levels of efficiency for retrieving and comparing visual data, as well as defining new machine learning techniques that extend our current  ...  Here, our optimisation function encourages the formation of class specific Gaussian clusters around each anchor.  ... 
doi:10.26180/5c53bf3eadb54 fatcat:im4u5eytfnhyrkjyk5hacgsb5u
« Previous Showing results 1 — 15 out of 21 results