A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is
Anne-Christin Hauschild is grateful for the financial aid provided by the International Max Planck Research School, Saarbrüucken, Germany. ... Anne-Christin Hauschild, Jörg Ingo Baumbach and Jan Baumbach performed the evaluations. All authors equally contributed to writing the manuscript. ... Author Contributions Anne-Christin Hauschild and Tobias Frisch implemented and tested the Carotta software. All authors contributed to developing the data processing schemes. ...doi:10.3390/metabo5020344 pmid:26065494 pmcid:PMC4495376 fatcat:4hl6exclwrfbroexbgcgwrqmeq
The rapid improvement of next-generation sequencing (NGS) technologies and their application in large-scale cohorts in cancer research led to common challenges of big data. It opened a new research area incorporating systems biology and machine learning. As large-scale NGS data accumulated, sophisticated data analysis methods became indispensable. In addition, NGS data have been integrated with systems biology to build better predictive models to determine the characteristics of tumors anddoi:10.3390/cancers13133148 pmid:34202427 pmcid:PMC8269018 fatcat:te73wenebne4hkc5p77m2ry7em
more »... subtypes. Therefore, various machine learning algorithms were introduced to identify underlying biological mechanisms. In this work, we review novel technologies developed for NGS data analysis, and we describe how these computational methodologies integrate systems biology and omics data. Subsequently, we discuss how deep neural networks outperform other approaches, the potential of graph neural networks (GNN) in systems biology, and the limitations in NGS biomedical research. To reflect on the various challenges and corresponding computational solutions, we will discuss the following three topics: (i) molecular characteristics, (ii) tumor heterogeneity, and (iii) drug discovery. We conclude that machine learning and network-based approaches can add valuable insights and build highly accurate models. However, a well-informed choice of learning algorithm and biological network information is crucial for the success of each specific research question.
As we saw at the 2013 Breath Analysis Summit, breath analysis is a rapidly evolving field. Increasingly sophisticated technology is producing huge amounts of complex data. A major barrier now faced by the breath research community is the analysis of these data. Emerging breath data require sophisticated, modern statistical methods to allow for a careful and robust deduction of real-world conclusions. Keywords breath analysis; exhaled nitric oxide; machine learning; robustness; statistics Thedoi:10.1088/1752-7155/8/1/012001 pmid:24565974 pmcid:PMC4014528 fatcat:xuf3oqugsjaztira7odyqowlqu
more »... entific program at the 2013 Breath Analysis Summit provided stimulating insights into the wealth of information that can be gleaned from air exhaled by humans. Since exhaled breath can be sampled continuously and non-invasively, there is great potential for breath analysis to lead to the development of biomarkers with widespread clinical and public health applications. Beyond breathalyzers in law enforcement, exhaled breath monitoring has become routine in clinical practice for monitoring patients undergoing anesthesia. The fractional concentration of exhaled nitric oxide (FeNO)-a marker of aspects of airway inflammation-has been studied extensively in research settings and considered for clinical applications in asthma. The breadth of developmental applications discussed at this year's summit was remarkable, ranging from diagnosis of diseases (e.g., lung cancer, tuberculosis) to locating survivors trapped in rubble following natural disasters. As an emerging field, breath analysis is rapidly evolving. Increasingly sophisticated technology is producing huge amounts of increasingly complex data. Major data barriers now faced by the breath research community include standardizing sampling protocols,
Since the outbreak in 2019, researchers are trying to find effective drugs against the SARS-CoV-2 virus based on de novo drug design and drug repurposing. The former approach is very time consuming and needs extensive testing in humans, whereas drug repurposing is more promising, as the drugs have already been tested for side effects, etc. At present, there is no treatment for COVID-19 that is clinically effective, but there is a huge amount of data from studies that analyze potential drugs. Wedoi:10.1016/j.isci.2020.101297 pmid:32619700 pmcid:PMC7305714 fatcat:ioiuygxny5cibhimfb5asb6ehi
more »... developed CORDITE to efficiently combine state-of-the-art knowledge on potential drugs and make it accessible to scientists and clinicians. The web interface also provides access to an easy-to-use API that allows a wide use for other software and applications, e.g., for meta-analysis, design of new clinical studies, or simple literature search. CORDITE is currently empowering many scientists across all continents and accelerates research in the knowledge domains of virology and drug design.
Ion mobility spectrometry with pre-separation by multi-capillary columns (MCC/IMS) has become an established inexpensive, non-invasive bioanalytics technology for detecting volatile organic compounds (VOCs) with various metabolomics applications in medical research. To pave the way for this technology towards daily usage in medical practice, different steps still have to be taken. With respect to modern biomarker research, one of the most important tasks is the automatic classification ofdoi:10.3390/metabo3020277 pmid:24957992 pmcid:PMC3901270 fatcat:z2afdejpandjfbm4kwyfusnncq
more »... t-specific data sets into different groups, healthy or not, for instance. Although sophisticated machine learning methods exist, an inevitable preprocessing step is reliable and robust peak detection without manual intervention. In this work we evaluate four state-of-the-art approaches for automated IMS-based peak detection: local maxima search, watershed transformation with IPHEx, region-merging with VisualNow, and peak model estimation (PME). We manually generated Metabolites 2013, 3 278 a gold standard with the aid of a domain expert (manual) and compare the performance of the four peak calling methods with respect to two distinct criteria. We first utilize established machine learning methods and systematically study their classification performance based on the four peak detectors' results. Second, we investigate the classification variance and robustness regarding perturbation and overfitting. Our main finding is that the power of the classification accuracy is almost equally good for all methods, the manually created gold standard as well as the four automatic peak finding methods. In addition, we note that all tools, manual and automatic, are similarly robust against perturbations. However, the classification performance is more robust against overfitting when using the PME as peak calling preprocessor. In summary, we conclude that all methods, though small differences exist, are largely reliable and enable a wide spectrum of real-world biomedical applications.
MicroRNAs are important regulators of gene expression, achieved by binding to the gene to be regulated. Even with modern high-throughput technologies, it is laborious and expensive to detect all possible microRNA targets. For this reason, several computational microRNA-target prediction tools have been developed, each with its own strengths and limitations. Integration of different tools has been a successful approach to minimize the shortcomings of individual databases. Here, we present mirDIPdoi:10.1093/nar/gkx1144 pmid:29194489 pmcid:PMC5753284 fatcat:4avz62zhafdvjen3c3tsgwtr3q
more »... v4.1, providing nearly 152 million human microRNAtarget predictions, which were collected across 30 different resources. We also introduce an integrative score, which was statistically inferred from the obtained predictions, and was assigned to each unique microRNA-target interaction to provide a unified measure of confidence. We demonstrate that integrating predictions across multiple resources does not cumulate prediction bias toward biological processes or pathways. mirDIP v4.1 is freely available at
Antimicrobial resistance (AMR) is one of the biggest global problems threatening human and animal health. Rapid and accurate AMR diagnostic methods are thus very urgently needed. However, traditional antimicrobial susceptibility testing (AST) is time-consuming, low throughput, and viable only for cultivable bacteria. Machine learning methods may pave the way for automated AMR prediction based on genomic data of the bacteria. However, comparing different machine learning methods for thedoi:10.1093/bioinformatics/btab681 pmid:34613360 pmcid:PMC8722762 fatcat:hhs4fevh6vagtkeu2p3uq33cpu
more »... n of AMR based on different encodings and whole-genome sequencing data without previously known knowledge remains to be done. In the current study, we evaluated logistic regression (LR), support vector machine (SVM), random forest (RF), and convolutional neural network (CNN) for the prediction of AMR for the antibiotics ciprofloxacin (CIP), cefotaxime (CTX), ceftazidime (CTZ), and gentamicin (GEN). We could demonstrate that these models can effectively predict AMR with label encoding, one-hot encoding, and frequency matrix chaos game representation (FCGR encoding) on whole-genome sequencing data. We trained these models on a large AMR dataset and evaluated them on an independent public data set. Generally, RFs and CNNs perform better than LR and SVM with AUCs up to 0.96. Furthermore, we were able to identify mutations that are associated with AMR for each antibiotic. Source code in data preparation and model training are provided at GitHub website (https://github.com/YunxiaoRen/ML-iAMR). Supplementary data are available at Bioinformatics online.
Tremendous advances in next-generation sequencing technology have enabled the accumulation of large amounts of omics data in various research areas over the past decade. However, study limitations due to small sample sizes, especially in rare disease clinical research, technological heterogeneity and batch effects limit the applicability of traditional statistics and machine learning analysis. Here, we present a meta-transfer learning approach to transfer knowledge from big data and reduce thedoi:10.1093/nargab/lqab104 pmid:34805988 pmcid:PMC8598306 fatcat:bxac6janrbg25nnj55ss7rd3mi
more »... earch space in data with small sample sizes. Few-shot learning algorithms integrate meta-learning to overcome data scarcity and data heterogeneity by transferring molecular pattern recognition models from datasets of unrelated domains. We explore few-shot learning models with large scale public dataset, TCGA (The Cancer Genome Atlas) and GTEx dataset, and demonstrate their potential as pre-training dataset in other molecular pattern recognition tasks. Our results show that meta-transfer learning is very effective for datasets with a limited sample size. Furthermore, we show that our approach can transfer knowledge across technological heterogeneity, for example, from bulk cell to single-cell data. Our approach can overcome study size constraints, batch effects and technical limitations in analyzing single-cell data by leveraging existing bulk-cell sequencing data.
AbstractDistinct bacteria are able to cope with highly diverse lifestyles; for instance, they can be free living or host-associated. Thus, these organisms must possess a large and varied genomic arsenal to withstand different environmental conditions. To facilitate the identification of genomic features that might influence bacterial adaptation to a specific niche, we introduce LifeStyle-Specific-Islands (LiSSI). LiSSI combines evolutionary sequence analysis with statistical learning (Randomdoi:10.1515/jib-2017-0010 pmid:28678736 fatcat:t7y75gyly5fxfnhynueprrg24e
more »... est with feature selection, model tuning and robustness analysis). In summary, our strategy aims to identify conserved consecutive homology sequences (islands) in genomes and to identify the most discriminant islands for each lifestyle.
Selecting the most promising treatment strategy for breast cancer crucially depends on determining the correct subtype. In recent years, gene expression profiling has been investigated as an alternative to histochemical methods. Since databases like TCGA provide easy and unrestricted access to gene expression data for hundreds of patients, the challenge is to extract a minimal optimal set of genes with good prognostic properties from a large bulk of genes making a moderate contribution todoi:10.2390/biecoll-jib-2014-236 pmid:24953305 fatcat:mpuy5hmt4rg55n7hrl6oq7eoau
more »... fication. Several studies have successfully applied machine learning algorithms to solve this so-called gene selection problem. However, more diverse data from other OMICS technologies are available, including methylation. We hypothesize that combining methylation and gene expression data could already lead to a largely improved classification model, since the resulting model will reflect differences not only on the transcriptomic, but also on an epigenetic level. We compared so-called random forest derived classification models based on gene expression and methylation data alone, to a model based on the combined features and to a model based on the gold standard PAM50. We obtained bootstrap errors of 10-20% and classification error of 1-50%, depending on breast cancer subtype and model. The gene expression model was clearly superior to the methylation model, which was also reflected in the combined model, which mainly selected features from gene expression data. However, the methylation model was able to identify unique features not considered as relevant by the gene expression model, which might provide deeper insights into breast cancer subtype differentiation on an epigenetic level.
SummarySelecting the most promising treatment strategy for breast cancer crucially depends on determining the correct subtype. In recent years, gene expression profiling has been investigated as an alternative to histochemical methods. Since databases like TCGA provide easy and unrestricted access to gene expression data for hundreds of patients, the challenge is to extract a minimal optimal set of genes with good prognostic properties from a large bulk of genes making a moderate contributiondoi:10.1515/jib-2014-236 fatcat:rmdgjvi4ufb4vaiuvmkuhtabry
more »... classification. Several studies have successfully applied machine learning algorithms to solve this so-called gene selection problem. However, more diverse data from other OMICS technologies are available, including methylation. We hypothesize that combining methylation and gene expression data could already lead to a largely improved classification model, since the resulting model will reflect differences not only on the transcriptomic, but also on an epigenetic level. We compared so-called random forest derived classification models based on gene expression and methylation data alone, to a model based on the combined features and to a model based on the gold standard PAM50. We obtained bootstrap errors of 10-20% and classification error of 1-50%, depending on breast cancer subtype and model. The gene expression model was clearly superior to the methylation model, which was also reflected in the combined model, which mainly selected features from gene expression data. However, the methylation model was able to identify unique features not considered as relevant by the gene expression model, which might provide deeper insights into breast cancer subtype differentiation on an epigenetic level.
Both studies, Westhoff et al. 2011  as well as Hauschild et al. ... Another, more recent study by Hauschild et al. in 2012 focused on the classification and biomarker identification of COPD and bronchial carcinoma based on MCC/IMS data. ...doi:10.3390/metabo2040733 pmid:24957760 pmcid:PMC3901238 fatcat:ylfovhqs6rcvvjnr2y3gscb5bq
SummaryOver the last decade the evaluation of odors and vapors in human breath has gained more and more attention, particularly in the diagnostics of pulmonary diseases. Ion mobility spectrometry coupled with multi-capillary columns (MCC/IMS), is a well known technology for detecting volatile organic compounds (VOCs) in air. It is a comparatively inexpensive, non-invasive, high-throughput method, which is able to handle the moisture that comes with human exhaled air, and allows fordoi:10.1515/jib-2013-218 fatcat:3w46pwxcx5g4pnozfj5y77psuq
more »... g of VOCs in very low concentrations. To identify discriminating compounds as biomarkers, it is necessary to have a clear understanding of the detailed composition of human breath. Therefore, in addition to the clinical studies, there is a need for a flexible and comprehensive centralized data repository, which is capable of gathering all kinds of related information. Moreover, there is a demand for automated data integration and semi-automated data analysis, in particular with regard to the rapid data accumulation, emerging from the high-throughput nature of the MCC/IMS technology. Here, we present a comprehensive database application and analysis platform, which combines metabolic maps with heterogeneous biomedical data in a well-structured manner. The design of the database is based on a hybrid of the entity-attribute- value (EAV) model and the EAV-CR, which incorporates the concepts of classes and relationships. Additionally it offers an intuitive user interface that provides easy and quick access to the platform's functionality: automated data integration and integrity validation, versioning and roll-back strategy, data retrieval as well as semi-automatic data mining and machine learning capabilities. The platform will support MCC/IMS-based biomarker identification and validation. The software, schemata, data sets and further information is publicly available at http://imsdb.mpi-inf.mpg.de.
We also searched the Nephroseq v5 database (www.nephroseq.org, March 2020, University of Michigan, Ann Arbor, MI) for renal expression data of the urinary signature proteins. ...doi:10.1371/journal.pone.0233639 pmid:32453760 fatcat:3kgm62wpera2bbtf473du5rrw4
Antidepressant outcomes in older adults with depression is poor, possibly because of comorbidities such as cerebrovascular disease. Therefore, we leveraged multiple genome-wide approaches to understand the genetic architecture of antidepressant response. Our sample included 307 older adults (≥60 years) with current major depression, treated with venlafaxine extended-release for 12 weeks. A standard genome-wide association study (GWAS) was conducted for post-treatment remission status, followeddoi:10.1038/s41398-021-01248-3 pmid:33589590 fatcat:4mehnuc325eu5oo3mhbjukxjau
more »... y in silico biological characterization of associated genes, as well as polygenic risk scoring for depression, neurodegenerative and cerebrovascular disease. The top-associated variants for remission status and percentage symptom improvement were PIEZO1 rs12597726 (OR = 0.33 [0.21, 0.51], p = 1.42 × 10-6) and intergenic rs6916777 (Beta = 14.03 [8.47, 19.59], p = 1.25 × 10-6), respectively. Pathway analysis revealed significant contributions from genes involved in the ubiquitin-proteasome system, which regulates intracellular protein degradation with has implications for inflammation, as well as atherosclerotic cardiovascular disease (n = 25 of 190 genes, p = 8.03 × 10-6, FDR-corrected p = 0.01). Given the polygenicity of complex outcomes such as antidepressant response, we also explored 11 polygenic risk scores associated with risk for Alzheimer's disease and stroke. Of the 11 scores, risk for cardioembolic stroke was the second-best predictor of non-remission, after being male (Accuracy = 0.70 [0.59, 0.79], Sensitivity = 0.72, Specificity = 0.67; p = 2.45 × 10-4). Although our findings did not reach genome-wide significance, they point to previously-implicated mechanisms and provide support for the roles of vascular and inflammatory pathways in LLD. Overall, significant enrichment of genes involved in protein degradation pathways that may be impaired, as well as the predictive capacity of risk for cardioembolic stroke, support a link between late-life depression remission and risk for vascular dysfunction.
« Previous Showing results 1 — 15 out of 121 results