Filters








70,723 Hits in 4.4 sec

Large scale hierarchical clustering of protein sequences

Antje Krause, Jens Stoye, Martin Vingron
2005 BMC Bioinformatics  
We report on our developments in grouping all known protein sequences hierarchically into superfamily and family clusters.  ...  Comparisons with other widely used clustering methods on various data sets show the abilities and strengths of our clustering methods in producing a biologically meaningful grouping of protein sequences  ...  Additionally we decided to compare our clustering procedure to one of the most widely used and publicly available methods for large scale protein sequence clustering, namely TribeMCL.  ... 
doi:10.1186/1471-2105-6-15 pmid:15663796 pmcid:PMC547898 fatcat:az3blh77ifa6doinrtch4cap4i

Sequential Hierarchical Pattern Clustering [chapter]

Bassam Farran, Amirthalingam Ramanan, Mahesan Niranjan
2009 Lecture Notes in Computer Science  
In this paper we propose a novel sequential hierarchical clustering technique that initially builds a hierarchical tree from a small fraction of the entire data, while the remaining data is processed sequentially  ...  This makes it computationally expensive and difficult to cope with large scale data used in several applications, such as in bioinformatics.  ...  AR is partially supported by a grant from the School of Electronics and Computer Science, University of Southampton, United Kingdom, and University of Jaffna, Sri Lanka, under the IRQUE Project funded  ... 
doi:10.1007/978-3-642-04031-3_8 fatcat:ctckapybz5daxjdufnhydkxa3a

Integrative data mining: the new direction in bioinformatics

P. Bertone, M. Gerstein
2001 IEEE Engineering in Medicine and Biology Magazine  
Integration of Databases and Large-Scale Surveys In addition to sequence and structure databases, many diverse experimental data sets have been compiled that focus on various aspects of protein function  ...  The current landscape of biological databases includes large public archives, such as GenBank, DDBJ, and EMBL for nucleic acid sequences [1] ; PIR and SWISS-PROT for protein sequences [2] ; and the Protein  ...  His research is focused on bioinformatics and he is particularly interested in large-scale integrative surveys, biological database design, macromolecular genomtry, molecular simulation, genome annotation  ... 
doi:10.1109/51.940042 pmid:11494767 fatcat:kh4u7xxslve4ddmnkwurjmayp4

BAR-PLUS: the Bologna Annotation Resource Plus for functional and structural annotation of protein sequences

D. Piovesan, P. Luigi Martelli, P. Fariselli, A. Zauli, I. Rossi, R. Casadio
2011 Nucleic Acids Research  
BAR + is based on a large-scale genome cross comparison and a non-hierarchical clustering procedure characterized by a metric that ensures a reliable transfer of features within clusters.  ...  We introduce BAR-PLUS (BAR + ), a web server for functional and structural annotation of protein sequences.  ...  Similarly to BAR (11) , BAR + is also a non-hierarchical clustering method relying on a comparative large-scale genome analysis.  ... 
doi:10.1093/nar/gkr292 pmid:21622657 pmcid:PMC3125743 fatcat:abdczhrn4fgwrphmutaxycsrku

Distributed ICSA Clustering Approach for Large Scale Protein Sequences and Cancer Diagnosis

Thenmozhi K, Karthikeyani Visalakshi N, Shanthi S, Pyingkodi M
2018 Asian Pacific Journal of Cancer Prevention  
However, exact clustering algorithms, such as partitioned and hierarchical clustering, scale relatively poorly in terms of run time and memory usage with large sets of sequences.  ...  The proposed ICSA, a global optimized algorithm that can cluster large numbers of protein sequences by running on distributed computing hardware.  ...  of this paper.  ... 
doi:10.31557/apjcp.2018.19.11.3105 pmid:30486549 fatcat:hnhexv5h45ftrewjdy2zigwyje

DAVI: a tool for clustering and visualising protein domain architectures [article]

Nathan Jawadi Chadi, Paul Saighi, Fabio Rocha Jimenez Vieira, Juliana Silva Bernardes
2021 bioRxiv   pre-print
DAVI accepts the output of most used domain architecture prediction tools and also produces domain architectures for a set of protein sequences.  ...  Here we present DAVI, an efficient and user-friendly web server for protein domain architecture clustering and visualization.  ...  By hiding domain positions, users can easily compare a large set of proteins of a given group.  ... 
doi:10.1101/2021.09.24.461671 fatcat:mrizrynykza2tczbmojkr4fbii

Phylogenetic Tree Generation using Different Scoring Methods

Rajbir Singh, Sinapreet Kaur, Dheeraj Pal Kaur
2014 International Journal of Computer Applications  
A method for construction of distance based phylogenetic tree using hierarchical clustering is proposed and implemented on different rice varieties. The sequences are downloaded from NCBI databank.  ...  Hierarchical Clustering is the one of the main techniques for data mining. Phylogeny is the evolutionary history for a set of evolutionary related species.  ...  .It may be small or large scale insertions or deletions.  ... 
doi:10.5120/17597-8404 fatcat:2z5ayqgr6zgpff5w3e64wmbpfa

Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space

Y. Loewenstein, E. Portugaly, M. Fromer, M. Linial
2008 Bioinformatics  
We show that newly created tree captures protein families better than state-of-the-art large scale methods such as CluSTr, ProtoNet4, or single-linkage clustering.  ...  We apply our algorithm to the entire collection of protein sequences, to automatically build a novel evolutionary tree of all proteins using no prior knowledge.  ...  We also thank the MOSIX and system group at HUJI, which have made the grid parallelization of BLAST possible.  ... 
doi:10.1093/bioinformatics/btn174 pmid:18586742 pmcid:PMC2718652 fatcat:il4fx2pvd5bk3ds3vtospqgqnu

Analysis and Visualization of Gene Expressions and Protein Structures

Ashraf S. Hussein
2008 Journal of Software  
diverse large scale experimental data sets, and (b) difficulty in integrating the most recent analysis and visualization tools due to the lack of standard I/O.  ...  This paper describes a web-based interactive framework for the analysis and visualization of gene expressions and protein structures.  ...  Figure 7 illustrates the algorithm of the hierarchical clustering technique.  ... 
doi:10.4304/jsw.3.7.2-11 fatcat:mfcabxb2xrbsjmie7rbvualsjm

Synthetic Test Data Generation for Hierarchical Graph Clustering Methods [chapter]

László Szilágyi, Levente Kovács, Sándor Miklós Szilágyi
2014 Lecture Notes in Computer Science  
Recent achievements in graph-based clustering algorithms revealed the need for large-scale test data sets.  ...  Generated data sets have a healthy amount of variability due to the randomness in the processing, and are suitable for testing graph-based clustering algorithms on large-scale data.  ...  The proposed method can efficiently support the validation process of hierarchical clustering algorithms on large-scale data.  ... 
doi:10.1007/978-3-319-12640-1_37 fatcat:wuqbdsoxyfdkbicod7kweqkh6a

A Hierarchical Approach to Scaling Batch Active Search Over Structured Data [article]

Vivek Myers, Peyton Greenside
2020 arXiv   pre-print
We focus our application of HBBS on modern biology, where large batch experimentation is often fundamental to the research process, and demonstrate batch design of biological sequences (protein and DNA  ...  In this paper, we present a general hierarchical framework based on bandit algorithms to scale active search to large batch sizes by maximizing information derived from the unique structure of each dataset  ...  GPs are popular but notoriously difficult to scale to large data sets.  ... 
arXiv:2007.10263v1 fatcat:3x4h73ci2rdzzlwvxwbb5hmrom

CoeViz: A Web-Based Integrative Platform for Interactive Visualization of Large Similarity and Distance Matrices

Frazier Baker, Aleksey Porollo
2018 Data  
It consists of four major interconnected and synchronized components: a zoomable heatmap, interactive hierarchical tree, scalable circular relationship diagram, and 3D multi-dimensional scaling (MDS) scatterplot  ...  We demonstrate the use of the platform for the analysis of amino acid covariance data in proteins as part of our previously developed CoeViz tool.  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/data3010004 pmid:29423399 pmcid:PMC5798608 fatcat:zhiuhcsshbe6bjijodybzoq4ny

Cluster analysis and phylogenetic relationship in biomarker identification of type 2 diabetes and nephropathy

SatyaVani Guttula, AllamAppa Rao, GR Sridhar, MS Chakravarthy, Kunjum Nageshwararo, PaturiV Rao
2010 International Journal of Diabetes in Developing Countries  
of disease on a genomic scale.  ...  Microarray techniques using cDNAs are much high throughput approaches for large scale gene expression analysis and enable the investigation of mechanisms of fundamental processes and the molecular basis  ... 
doi:10.4103/0973-3930.60003 pmid:20431808 pmcid:PMC2859286 fatcat:bia3oz7zgzdvhn4xf5v4plxqoa

An explorable public transcriptomics compendium for eukaryotic microalgae [article]

Justin Ashworth, Peter J Ralph
2018 bioRxiv   pre-print
, as well as to provide a large new resource of integrated information to facilitate the efforts of others to further compare and contextualize the results of individual and new experiments within and  ...  Numerous genomes and transcriptomes of these species have been carefully sequenced, providing an unprecedented view into the vast genetic repertoires and the diverse transcriptional programs operating  ...  Bootstrapped hierarchical clustering Agglomerative hierarchical clustering was performed using a c++ wrapper to call the Fastcluster [30] library directly and repeatedly, with in-memory multi-scale resampling  ... 
doi:10.1101/403063 fatcat:dlkme2qcwbb4hpdx4lbj5r5bqy

CD-HIT Suite: a web server for clustering and comparing biological sequences

Ying Huang, Beifang Niu, Ying Gao, Limin Fu, Weizhong Li
2010 Computer applications in the biosciences : CABIOS  
CD-HIT is a widely used program for clustering and comparing large biological sequence datasets.  ...  Availability: Free access at http://cd-hit.org users to cluster or compare sequences without installing and executing the command-line version of CD-HIT locally.  ...  Funding: National Institutes of Health (1R01RR025030) from National Center for Research Resources. Conflict of Interest: none declared.  ... 
doi:10.1093/bioinformatics/btq003 pmid:20053844 pmcid:PMC2828112 fatcat:k2hldpq3izhm3mu55bwcyudali
« Previous Showing results 1 — 15 out of 70,723 results