A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Large scale hierarchical clustering of protein sequences
2005
BMC Bioinformatics
We report on our developments in grouping all known protein sequences hierarchically into superfamily and family clusters. ...
Comparisons with other widely used clustering methods on various data sets show the abilities and strengths of our clustering methods in producing a biologically meaningful grouping of protein sequences ...
Additionally we decided to compare our clustering procedure to one of the most widely used and publicly available methods for large scale protein sequence clustering, namely TribeMCL. ...
doi:10.1186/1471-2105-6-15
pmid:15663796
pmcid:PMC547898
fatcat:az3blh77ifa6doinrtch4cap4i
Sequential Hierarchical Pattern Clustering
[chapter]
2009
Lecture Notes in Computer Science
In this paper we propose a novel sequential hierarchical clustering technique that initially builds a hierarchical tree from a small fraction of the entire data, while the remaining data is processed sequentially ...
This makes it computationally expensive and difficult to cope with large scale data used in several applications, such as in bioinformatics. ...
AR is partially supported by a grant from the School of Electronics and Computer Science, University of Southampton, United Kingdom, and University of Jaffna, Sri Lanka, under the IRQUE Project funded ...
doi:10.1007/978-3-642-04031-3_8
fatcat:ctckapybz5daxjdufnhydkxa3a
Integrative data mining: the new direction in bioinformatics
2001
IEEE Engineering in Medicine and Biology Magazine
Integration of Databases and Large-Scale Surveys In addition to sequence and structure databases, many diverse experimental data sets have been compiled that focus on various aspects of protein function ...
The current landscape of biological databases includes large public archives, such as GenBank, DDBJ, and EMBL for nucleic acid sequences [1] ; PIR and SWISS-PROT for protein sequences [2] ; and the Protein ...
His research is focused on bioinformatics and he is particularly interested in large-scale integrative surveys, biological database design, macromolecular genomtry, molecular simulation, genome annotation ...
doi:10.1109/51.940042
pmid:11494767
fatcat:kh4u7xxslve4ddmnkwurjmayp4
BAR-PLUS: the Bologna Annotation Resource Plus for functional and structural annotation of protein sequences
2011
Nucleic Acids Research
BAR + is based on a large-scale genome cross comparison and a non-hierarchical clustering procedure characterized by a metric that ensures a reliable transfer of features within clusters. ...
We introduce BAR-PLUS (BAR + ), a web server for functional and structural annotation of protein sequences. ...
Similarly to BAR (11) , BAR + is also a non-hierarchical clustering method relying on a comparative large-scale genome analysis. ...
doi:10.1093/nar/gkr292
pmid:21622657
pmcid:PMC3125743
fatcat:abdczhrn4fgwrphmutaxycsrku
Distributed ICSA Clustering Approach for Large Scale Protein Sequences and Cancer Diagnosis
2018
Asian Pacific Journal of Cancer Prevention
However, exact clustering algorithms, such as partitioned and hierarchical clustering, scale relatively poorly in terms of run time and memory usage with large sets of sequences. ...
The proposed ICSA, a global optimized algorithm that can cluster large numbers of protein sequences by running on distributed computing hardware. ...
of this paper. ...
doi:10.31557/apjcp.2018.19.11.3105
pmid:30486549
fatcat:hnhexv5h45ftrewjdy2zigwyje
DAVI: a tool for clustering and visualising protein domain architectures
[article]
2021
bioRxiv
pre-print
DAVI accepts the output of most used domain architecture prediction tools and also produces domain architectures for a set of protein sequences. ...
Here we present DAVI, an efficient and user-friendly web server for protein domain architecture clustering and visualization. ...
By hiding domain positions, users can easily compare a large set of proteins of a given group. ...
doi:10.1101/2021.09.24.461671
fatcat:mrizrynykza2tczbmojkr4fbii
Phylogenetic Tree Generation using Different Scoring Methods
2014
International Journal of Computer Applications
A method for construction of distance based phylogenetic tree using hierarchical clustering is proposed and implemented on different rice varieties. The sequences are downloaded from NCBI databank. ...
Hierarchical Clustering is the one of the main techniques for data mining. Phylogeny is the evolutionary history for a set of evolutionary related species. ...
.It may be small or large scale insertions or deletions. ...
doi:10.5120/17597-8404
fatcat:2z5ayqgr6zgpff5w3e64wmbpfa
Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space
2008
Bioinformatics
We show that newly created tree captures protein families better than state-of-the-art large scale methods such as CluSTr, ProtoNet4, or single-linkage clustering. ...
We apply our algorithm to the entire collection of protein sequences, to automatically build a novel evolutionary tree of all proteins using no prior knowledge. ...
We also thank the MOSIX and system group at HUJI, which have made the grid parallelization of BLAST possible. ...
doi:10.1093/bioinformatics/btn174
pmid:18586742
pmcid:PMC2718652
fatcat:il4fx2pvd5bk3ds3vtospqgqnu
Analysis and Visualization of Gene Expressions and Protein Structures
2008
Journal of Software
diverse large scale experimental data sets, and (b) difficulty in integrating the most recent analysis and visualization tools due to the lack of standard I/O. ...
This paper describes a web-based interactive framework for the analysis and visualization of gene expressions and protein structures. ...
Figure 7 illustrates the algorithm of the hierarchical clustering technique. ...
doi:10.4304/jsw.3.7.2-11
fatcat:mfcabxb2xrbsjmie7rbvualsjm
Synthetic Test Data Generation for Hierarchical Graph Clustering Methods
[chapter]
2014
Lecture Notes in Computer Science
Recent achievements in graph-based clustering algorithms revealed the need for large-scale test data sets. ...
Generated data sets have a healthy amount of variability due to the randomness in the processing, and are suitable for testing graph-based clustering algorithms on large-scale data. ...
The proposed method can efficiently support the validation process of hierarchical clustering algorithms on large-scale data. ...
doi:10.1007/978-3-319-12640-1_37
fatcat:wuqbdsoxyfdkbicod7kweqkh6a
A Hierarchical Approach to Scaling Batch Active Search Over Structured Data
[article]
2020
arXiv
pre-print
We focus our application of HBBS on modern biology, where large batch experimentation is often fundamental to the research process, and demonstrate batch design of biological sequences (protein and DNA ...
In this paper, we present a general hierarchical framework based on bandit algorithms to scale active search to large batch sizes by maximizing information derived from the unique structure of each dataset ...
GPs are popular but notoriously difficult to scale to large data sets. ...
arXiv:2007.10263v1
fatcat:3x4h73ci2rdzzlwvxwbb5hmrom
CoeViz: A Web-Based Integrative Platform for Interactive Visualization of Large Similarity and Distance Matrices
2018
Data
It consists of four major interconnected and synchronized components: a zoomable heatmap, interactive hierarchical tree, scalable circular relationship diagram, and 3D multi-dimensional scaling (MDS) scatterplot ...
We demonstrate the use of the platform for the analysis of amino acid covariance data in proteins as part of our previously developed CoeViz tool. ...
Conflicts of Interest: The authors declare no conflict of interest. ...
doi:10.3390/data3010004
pmid:29423399
pmcid:PMC5798608
fatcat:zhiuhcsshbe6bjijodybzoq4ny
Cluster analysis and phylogenetic relationship in biomarker identification of type 2 diabetes and nephropathy
2010
International Journal of Diabetes in Developing Countries
of disease on a genomic scale. ...
Microarray techniques using cDNAs are much high throughput approaches for large scale gene expression analysis and enable the investigation of mechanisms of fundamental processes and the molecular basis ...
doi:10.4103/0973-3930.60003
pmid:20431808
pmcid:PMC2859286
fatcat:bia3oz7zgzdvhn4xf5v4plxqoa
An explorable public transcriptomics compendium for eukaryotic microalgae
[article]
2018
bioRxiv
pre-print
, as well as to provide a large new resource of integrated information to facilitate the efforts of others to further compare and contextualize the results of individual and new experiments within and ...
Numerous genomes and transcriptomes of these species have been carefully sequenced, providing an unprecedented view into the vast genetic repertoires and the diverse transcriptional programs operating ...
Bootstrapped hierarchical clustering Agglomerative hierarchical clustering was performed using a c++ wrapper to call the Fastcluster [30] library directly and repeatedly, with in-memory multi-scale resampling ...
doi:10.1101/403063
fatcat:dlkme2qcwbb4hpdx4lbj5r5bqy
CD-HIT Suite: a web server for clustering and comparing biological sequences
2010
Computer applications in the biosciences : CABIOS
CD-HIT is a widely used program for clustering and comparing large biological sequence datasets. ...
Availability: Free access at http://cd-hit.org users to cluster or compare sequences without installing and executing the command-line version of CD-HIT locally. ...
Funding: National Institutes of Health (1R01RR025030) from National Center for Research Resources. Conflict of Interest: none declared. ...
doi:10.1093/bioinformatics/btq003
pmid:20053844
pmcid:PMC2828112
fatcat:k2hldpq3izhm3mu55bwcyudali
« Previous
Showing results 1 — 15 out of 70,723 results