Filters








21,373 Hits in 6.1 sec

Sequence embedding for fast construction of guide trees for multiple sequence alignment

Gordon Blackshields, Fabian Sievers, Weifeng Shi, Andreas Wilm, Desmond G Higgins
2010 Algorithms for Molecular Biology  
Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N 2 for N sequences.  ...  Results: In this paper, we have tested variations on a class of embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are  ...  Acknowledgements The authors wish to thank Kazutaka Katoh for useful discussions and help with the use of MAFFT/PartTree.  ... 
doi:10.1186/1748-7188-5-21 pmid:20470396 pmcid:PMC2893182 fatcat:ygwjdvidprhuhnwkahm2l32af4

GIANA allows computationally-efficient TCR clustering and multi-disease repertoire classification by isometric transformation

Hongyi Zhang, Xiaowei Zhan, Bo Li
2021 Nature Communications  
However, existing methods that cluster T-cell receptor sequences by similarity are computationally inefficient, making them impractical to use on the ever-expanding datasets of the immune repertoire.  ...  Here, we developed GIANA (Geometric Isometry-based TCR AligNment Algorithm) a computationally efficient tool for this task that provides the same level of clustering specificity as TCRdist at 600 times  ...  Acknowledgements This work is supported by the following funding sources: Cancer Prevention and Research Institute of Texas (CPRIT) RR170079 (B.L.), NCI 1R01CA245318 (B.L.), NIGMS 5R01GM126479 (X.Z.),  ... 
doi:10.1038/s41467-021-25006-7 pmid:34349111 pmcid:PMC8339063 fatcat:22rmpvtotvb5pnurxynza4bak4

yHydra: Deep Learning enables an Ultra Fast Open Search by Jointly Embedding MS/MS Spectra and Peptides of Mass Spectrometry-based Proteomics [article]

Tom Altenburg, Thilo Muth, Bernhard Y. Renard
2021 bioRxiv   pre-print
In particular, we build an open search, which allows to search multiple ten-thousands of spectra against millions of peptides within seconds. yHydra achieves identification rates that are compatible with  ...  At the same time, our joint embeddings blur the lines between spectra and protein sequences, providing a powerful framework for peptide identification.  ...  Note that all spectrum embeddings of an entire run (typically multiple ten-thousands) are queried simultaneously against the entire database in a single call in order to achieve lowest possible search  ... 
doi:10.1101/2021.12.01.470818 fatcat:rg2ciym7jffsdh24twv3hbnbpu

Fast and adaptive protein structure representations for machine learning [article]

Janani Durairaj, Mehmet Akdel, Dick de Ridder, Aalt D.J. van Dijk
2021 bioRxiv   pre-print
thousand structures in 20 minutes.  ...  The growing prevalence and popularity of protein structure data, both experimental and computationally modelled, necessitates fast tools and algorithms to enable exploratory and interpretable structure-based  ...  The research presented here includes a novel multiple structure alignment algorithm and a demonstration of recently developed algorithms for analysing protein structures with machine learning.  ... 
doi:10.1101/2021.04.07.438777 fatcat:qospbzohkbbwbkrxdozbok7oje

LISA: Accurate reconstruction of cell trajectory and pseudo-time for massive single cell RNA-seq data

Yang Chen, Yuping Zhang, Zhengqing Ouyang
2019 Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing  
We propose a new method named Landmark Isomap for Single-cell Analysis (LISA).  ...  Cell trajectory reconstruction based on single cell RNA sequencing is important for obtaining the landscape of different cell types and discovering cell fate transitions.  ...  The sizes of datasets range from several hundreds to tens of thousands. All of them contain true time labels. LISA identified cell trajectory and estimate pseudo-time for all datasets.  ... 
pmid:30864335 pmcid:PMC6554064 fatcat:veyawztzozgrtbnztszu5vnyym

LISA: Accurate reconstruction of cell trajectory and pseudo-time for massive single cell RNA-seq data

Yang Chen, Yuping Zhang, Zhengqing Ouyang
2018 Biocomputing 2019  
We propose a new method named Landmark Isomap for Single-cell Analysis (LISA).  ...  Cell trajectory reconstruction based on single cell RNA sequencing is important for obtaining the landscape of different cell types and discovering cell fate transitions.  ...  The sizes of datasets range from several hundreds to tens of thousands. All of them contain true time labels. LISA identified cell trajectory and estimate pseudo-time for all datasets.  ... 
doi:10.1142/9789813279827_0031 fatcat:hyclj3kt5zb3xi73sdqg5aysse

scHiCTools: A computational toolbox for analyzing single-cell Hi-C data

Xinjun Li, Fan Feng, Hongxi Pu, Wai Yan Leung, Jie Liu, Mihaela Pertea
2021 PLoS Computational Biology  
embedding single cells, three methods for clustering cells, and a build-in function to visualize the cells embedding in a two-dimensional or three-dimensional plot. scHiCTools, written in Python3, is compatible  ...  The toolbox provides two methods for screening single cells, three common methods for smoothing scHi-C data, three efficient methods for calculating the pairwise similarity of cells, three methods for  ...  loci m is usually more than tens of thousands, depending on the resolution of the contact maps.  ... 
doi:10.1371/journal.pcbi.1008978 pmid:34003823 fatcat:yrqpnitrtvcvtg2hsiaiycjrg4

Neural Data Visualization for Scalable and Generalizable Single Cell Analysis [article]

Hyunghoon Cho, Bonnie Berger, Jian Peng
2018 bioRxiv   pre-print
However, standard methods for visualization, such as t-stochastic neighbor embedding (t-SNE), not only lack scalability to data sets with millions of cells, but also are unable to generalize to new cells  ...  , an important ability for transferring knowledge across fast-accumulating data sets.  ...  Acknowledgements A single page abstract of this work will appear in RECOMB 2018. This work was partially supported by NIH grant R01GM081871.  ... 
doi:10.1101/289223 fatcat:nqkbgumx2ra27mh4a4trqatrsa

The art of using t-SNE for single-cell transcriptomics

Dmitry Kobak, Philipp Berens
2019 Nature Communications  
Single-cell transcriptomics yields ever growing data sets containing RNA expression levels for thousands of genes from up to millions of cells.  ...  Common data analysis pipelines include a dimensionality reduction step for visualising the data in two dimensions, most frequently performed using t-distributed stochastic neighbour embedding (t-SNE).  ...  In our experiments, for the data sets with tens of thousands of cells, the number of significant PCs was usually close to 50 (for example, for the Tasic et al. 3 data set it was 40, according to the  ... 
doi:10.1038/s41467-019-13056-x pmid:31780648 pmcid:PMC6882829 fatcat:t5m5mzmzdvg27icxomglkzutqu

Cover Detection Using Dominant Melody Embeddings

Guillaume Doras, Geoffroy Peeters
2019 Zenodo  
On the other hand, faster approaches designed to process thousands of pairwise comparisons resulted in lower accuracy, making them unsuitable for practical use.  ...  We further propose to extract each track's embedding out of its dominant melody representation, obtained by another neural network trained for this task.  ...  The dominant melody representations of tracks used as training dataset in this work is available upon request.  ... 
doi:10.5281/zenodo.3527751 fatcat:eaf73iztzjf6nhiak3tu6xojju

Graph Embedding via Graph Summarization

Jingyanning Yang, Jinguo You, Xiaorong Wan
2021 IEEE Access  
[6] proposed a fast graph embedding via a coarsening algorithm based on Schur complements for computing the vertices' embeddings.  ...  Compared with other clustering algorithms, Canopy clustering has lower accuracy but is a fast clustering technology.  ... 
doi:10.1109/access.2021.3067901 fatcat:6h2xmr42cbgztmazzj7o6xzodu

Geometricus Represents Protein Structures as Shape-mers Derived from Moment Invariants [article]

Janani Durairaj, Mehmet Akdel, Dick de Ridder, Aalt D.J. van Dijk
2020 bioRxiv   pre-print
We demonstrate the applicability of this approach in various tasks, ranging from fast structure similarity search, unsupervised clustering, and structure classification across proteins from different superfamilies  ...  Many existing embedding approaches are alignment-based, which is both time-consuming and ineffective for distantly related proteins.  ...  Funding This work was supported by the Netherlands Organization for Scientific Research (NWO), project numbers TTW 15043 (JD) and TTW 14516 (MA).  ... 
doi:10.1101/2020.09.07.285569 fatcat:b2mcho4swbgvzipc5wld5josim

Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument

Razvan Nutiu, Robin C Friedman, Shujun Luo, Irina Khrebtukova, David Silva, Robin Li, Lu Zhang, Gary P Schroth, Christopher B Burge
2011 Nature Biotechnology  
for all 12mer sequences having submicromolar affinity.  ...  While a number of methods for characterizing DNA-protein interactions are currently available 1-6 , none have demonstrated both quantitative measurement of affinity and high throughput.  ...  Sharp and members of the Burge lab for helpful discussion and comments on the manuscript.  ... 
doi:10.1038/nbt.1882 pmid:21706015 pmcid:PMC3134637 fatcat:h7vfcybsn5amziwlszqtjhbvce

De Novo Gene Expression Reconstruction in Space

Je H. Lee
2017 Trends in Molecular Medicine  
Here, we discuss potential next-generation approaches for de novo assembly of the transcriptome in space, and propose more efficient methods of detecting long-range spatial variations in gene expression  ...  Finally, we discuss future in situ sequencing chemistries for visualizing biological pathways and processes in tissues so that genomics technologies might be more easily applied to conditions of human  ...  for enabling the development of various concepts, tools, and technologies discussed in this manuscript.  ... 
doi:10.1016/j.molmed.2017.05.004 pmid:28571832 pmcid:PMC5514424 fatcat:ek2est56qrd5xin63v7jbqhgle

corral: Single-cell RNA-seq dimension reduction, batch integration, and visualization with correspondence analysis [article]

Lauren L Hsu, Aedin C Culhane
2021 bioRxiv   pre-print
CA variations are fast, scalable, and outperforms standard CA and glmPCA, to compute embeddings with more performant or comparable clustering accuracy in 8 out of 9 datasets.  ...  We introduce corralm, a CA-based method for multi-table batch integration of scRNAseq data in shared latent space, and we propose a new approach for assessing batch integration.  ...  Acknowledgements We are grateful for helpful discussions with Prof.  ... 
doi:10.1101/2021.11.24.469874 fatcat:wqz467kko5d33eeaan7xc6fyvm
« Previous Showing results 1 — 15 out of 21,373 results