158 Hits in 2.0 sec

Employing conservation of co-expression to improve functional inference

Carsten O Daub, Erik LL Sonnhammer
2008 BMC Systems Biology  
Observing co-expression between genes suggests that they are functionally coupled. Co-expression of orthologous gene pairs across species may improve function prediction beyond the level achieved in a single species. Results: We used orthology between genes of the three different species S. cerevisiae, D. melanogaster, and C. elegans to combine co-expression across two species at a time. This led to increased function prediction accuracy when we incorporated expression data from either of the
more » ... her two species and even further increased when conservation across both of the two other species was considered at the same time. Employing the conservation across species to incorporate abundant model organism data for the prediction of protein interactions in poorly characterized species constitutes a very powerful annotation method. Conclusion: To be able to employ the most suitable co-expression distance measure for our analysis, we evaluated the ability of four popular gene co-expression distance measures to detect biologically relevant interactions between pairs of genes. For the expression datasets employed in our co-expression conservation analysis above, we used the GO and the KEGG PATHWAY databases as gold standards. While the differences between distance measures were small, Spearman correlation showed to give most robust results.
doi:10.1186/1752-0509-2-81 pmid:18808668 pmcid:PMC2561017 fatcat:fgucgete2reyzjrhk5uhtgvphq

Transcriptional features of genomic regulatory blocks

Altuna Akalin, David Fredman, Erik Arner, Xianjun Dong, Jan Bryne, Harukazu Suzuki, Carsten O Daub, Yoshihide Hayashizaki, Boris Lenhard
2009 Genome Biology  
doi:10.1186/gb-2009-10-4-r38 pmid:19374772 pmcid:PMC2688929 fatcat:yfxt5zxztnbrxm5jiy6sx4bekq

SDRF2GRAPH – a visualization tool of a spreadsheet-based description of experimental processes

Hideya Kawaji, Yoshihide Hayashizaki, Carsten O Daub
2009 BMC Bioinformatics  
As larger datasets are produced with the development of genome-scale experimental techniques, it has become essential to explicitly describe the meta-data (information describing the data) generated by an experiment. The experimental process is a part of the metadata required to interpret the produced data, and SDRF (Sample and Data Relationship Format) supports its description in a spreadsheet or tab-delimited file. This format was primarily developed to describe microarray studies in
more » ... and it is being applied in a broader context in ISA-tab. While the format provides an explicit framework to describe experiments, increase of experimental steps makes it less obvious to understand the content of the SDRF files.
doi:10.1186/1471-2105-10-133 pmid:19422683 pmcid:PMC2689195 fatcat:6nm6zzryszc35bpsh77dvotqje

Ten simple rules for annotating sequencing experiments

Irene Stevens, Abdul Kadir Mukarram, Matthias Hörtenhuber, Terrence F. Meehan, Johan Rung, Carsten O. Daub, Scott Markel
2020 PLoS Computational Biology  
[22]) and workflow description standards (Common Workflow Language (CWL) [23] and Workflow Description Language (WDL) [21]). 17 . 17 Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O,  ... 
doi:10.1371/journal.pcbi.1008260 pmid:33017400 fatcat:cibocj4fgzay5ogc6m6t5tpyay

Sequence determinants of human gene regulatory elements [article]

Biswajyoti Sahu, Tuomo Hartonen, Paivi Pihlajamaa, Bei Wei, Kashyap Dave, Fangjie Zhu, Eevi Kaasinen, Katja Lidschreiber, Michael Lidschreiber, Carsten O Daub, Patrick Cramer, Teemu Kivioja (+1 others)
2021 bioRxiv   pre-print
DNA determines where and when genes are expressed, but the full set of sequence determinants that control gene expression is not known. To obtain a global and unbiased view of the relative importance of different sequence determinants in gene expression, we measured transcriptional activity of DNA sequences that are in aggregate ~100 times longer than the human genome in three different cell types. We show that enhancers can be classified to three main types: classical enhancers1, closed
more » ... in enhancers and chromatin-dependent enhancers, which act via different mechanisms and differ in motif content. Transcription factors (TFs) act generally in an additive manner with weak grammar, with classical enhancers increasing expression from promoters by a mechanism that does not involve specific TF-TF interactions. Few TFs are strongly active in a cell, with most activities similar between cell types. Chromatin-dependent enhancers are enriched in forkhead motifs, whereas classical enhancers contain motifs for TFs with strong transactivator domains such as ETS and bZIP; these motifs are also found at transcription start site (TSS)-proximal positions. However, some TFs, such as NRF1 only activate transcription when placed close to the TSS, and others such as YY1 display positional preference with respect to the TSS. TFs can thus be classified into four non-exclusive subtypes based on their transcriptional activity: chromatin opening, enhancing, promoting and TSS determining factors — consistent with the view that the binding motif is the only atomic unit of gene expression.
doi:10.1101/2021.03.18.435942 fatcat:tb4x2n46wrabpdsfwmu6bchjli

Identification and transfer of spatial transcriptomics signatures for cancer diagnosis

Niyaz Yoosuf, José Fernández Navarro, Fredrik Salmén, Patrik L. Ståhl, Carsten O. Daub
2020 Breast Cancer Research  
Distinguishing ductal carcinoma in situ (DCIS) from invasive ductal carcinoma (IDC) regions in clinical biopsies constitutes a diagnostic challenge. Spatial transcriptomics (ST) is an in situ capturing method, which allows quantification and visualization of transcriptomes in individual tissue sections. In the past, studies have shown that breast cancer samples can be used to study their transcriptomes with spatial resolution in individual tissue sections. Previously, supervised machine
more » ... methods were used in clinical studies to predict the clinical outcomes for cancer types.
doi:10.1186/s13058-019-1242-9 pmid:31931856 pmcid:PMC6958738 fatcat:pe3bysiqf5f37pjcaleosnlbue

Hidden layers of human small RNAs

Hideya Kawaji, Mari Nakamura, Yukari Takahashi, Albin Sandelin, Shintaro Katayama, Shiro Fukuda, Carsten O Daub, Chikatoshi Kai, Jun Kawai, Jun Yasuda, Piero Carninci, Yoshihide Hayashizaki
2008 BMC Genomics  
C/D box type snoRNA and snRNA products Small nucleolar RNAs (snoRNA) are between 60~300 nt long RNAs, and contribute mainly to 2'-O-methylation and the pseudouridylation of rRNA, snRNA, and potentially  ... 
doi:10.1186/1471-2164-9-157 pmid:18402656 pmcid:PMC2359750 fatcat:3cdgijy76jhv7a62iksyo44swy

*-DCC: A platform to collect, annotate, and explore a large variety of sequencing experiments

Matthias Hörtenhuber, Abdul K Mukarram, Marcus H Stoiber, James B Brown, Carsten O Daub
2020 GigaScience  
Background Over the past few years the variety of experimental designs and protocols for sequencing experiments increased greatly. To ensure the wide usability of the produced data beyond an individual project, rich and systematic annotation of the underlying experiments is crucial. Findings We first developed an annotation structure that captures the overall experimental design as well as the relevant details of the steps from the biological sample to the library preparation, the sequencing
more » ... cedure, and the sequencing and processed files. Through various design features, such as controlled vocabularies and different field requirements, we ensured a high annotation quality, comparability, and ease of annotation. The structure can be easily adapted to a large variety of species. We then implemented the annotation strategy in a user-hosted web platform with data import, query, and export functionality. Conclusions We present here an annotation structure and user-hosted platform for sequencing experiment data, suitable for lab-internal documentation, collaborations, and large-scale annotation efforts.
doi:10.1093/gigascience/giaa024 pmid:32170312 fatcat:xsyd7wkbozgvnkrlq4kjvobobm

Sequence determinants of human gene regulatory elements

Biswajyoti Sahu, Tuomo Hartonen, Päivi Pihlajamaa, Bei Wei, Kashyap Dave, Fangjie Zhu, Eevi Kaasinen, Katja Lidschreiber, Michael Lidschreiber, Carsten O. Daub, Patrick Cramer, Teemu Kivioja (+1 others)
2022 Nature Genetics  
O.  ...  Daub 7, 8 , Patrick Cramer 6,7 , Teemu Kivioja 1 and Jussi Taipale 1,3,5 ✉ DNA can determine where and when genes are expressed, but the full set of sequence determinants that control gene expression  ... 
doi:10.1038/s41588-021-01009-4 pmid:35190730 pmcid:PMC8920891 fatcat:qb2yyctegvbh3dcrepzhxl4say

Estimating mutual information using B-spline functions--an improved similarity measure for analysing gene expression data

Carsten O Daub, Ralf Steuer, Joachim Selbig, Sebastian Kloska
2004 BMC Bioinformatics  
MI X Y R X Y , l o g , ( )= − − ( ) ( ) Discussion and conclusion After a brief introduction into the information theoretic concept of mutual information, we proposed a method for its estimation from  ... 
doi:10.1186/1471-2105-5-118 pmid:15339346 pmcid:PMC516800 fatcat:j33cqa7w2rfqbihm723mmst6yy

Prediction of Function Divergence in Protein Families Using the Substitution Rate Variation Parameter Alpha

Saraswathi Abhiman, Carsten O. Daub, Erik L. L. Sonnhammer
2006 Molecular biology and evolution  
Protein families typically embody a range of related functions and may thus be decomposed into subfamilies with, for example, distinct substrate specificities. Detection of functionally divergent subfamilies is possible by methods for recognizing branches of adaptive evolution in a gene tree. As the number of genome sequences is growing rapidly, it is highly desirable to automatically detect subfamily function divergence. To this end, we here introduce a method for large-scale prediction of
more » ... tion divergence within protein families. It is called the alpha shift measure (ASM) as it is based on detecting a shift in the shape parameter (alpha [a]) of the substitution rate gamma distribution. Four different methods for estimating a were investigated. We benchmarked the accuracy of ASM using function annotation from Enzyme Commission numbers within Pfam protein families divided into subfamilies by the automatic tree-based method BETE. In a test using 563 subfamily pairs in 162 families, ASM outperformed functional site-based methods using rate or conservation shifting (rate shift measure [RSM] and conservation shift measure [CSM]). The best results were obtained using the "GZ-Gamma" method for estimating a. By combining ASM with RSM and CSM using linear discriminant analysis, the prediction accuracy was further improved.
doi:10.1093/molbev/msl002 pmid:16672285 fatcat:gvcz2aov3nbapjih3pgb74fueu

Glycyrrhiza uralensis Transcriptome Landscape and Study of Phytochemicals

Jordan A. Ramilowski, Satoru Sawai, Hikaru Seki, Keiichi Mochida, Takuhiro Yoshida, Tetsuya Sakurai, Toshiya Muranaka, Kazuki Saito, Carsten O. Daub
2013 Plant and Cell Physiology  
HID enzymes belong to the carboxylesterase family and convert 2,7-dihydroxy-4 0 -O-methoxyisoflavanone to 2,7-dihydroxy-4 0 -O-methoxyisoflavone in the isoflavonoid pathway.  ...  CHI, chalcone isomerase; CHR, chalcone reductase; CHS, chalcone synthase; DMID, 7,2 0 -dihydroxy-4 0 -O-methoxyisoflavanol dehydratase; HI4 0 OMT, 2,7,4 0 -trihydroxyisoflavanone 4 0 -O-methyltransferase  ... 
doi:10.1093/pcp/pct057 pmid:23589666 fatcat:ncstkckbsjavpitpteidwvhvjy

Optimization of turn-back primers in isothermal amplification

Yasumasa Kimura, Michiel J. L. de Hoon, Shintaro Aoki, Yuri Ishizu, Yuki Kawai, Yasushi Kogo, Carsten O. Daub, Alexander Lezhava, Erik Arner, Yoshihide Hayashizaki
2011 Nucleic Acids Research  
Since the amplification pathway of each strand is symmetric, here we show an example derived from one particular strand. i. First, a reverse trun-back primer (TPr1) hybridizes to target DNA sequence (1) and followed DNA extension mediated by DNA polymerase creates DNA fragment flanked by TPr1 (2). Both ends of newly synthesized strand can be denatured partly and forming a stem-loop structure with thermal fluctuation. Such condition allows another reverse TP (TPr2) to hybridize to the exposed
more » ... get sequence and extends new strand which peels the existing strand off by strand displacement activity of the DNA polymerase (3). ii. The released DNA strand becomes a template for the next step as a forward TP (TPf1) can hybridize there (4). Extension from TPf1 creates DNA fragment flanked by TPf1 and complement of TPr1 (cTPr1) (5), which can be peeled off by the same mechanism as described above for TPr1 (6). iii. We named this peeled fragment flanked by TPf1 and cTPr1 as Intermediate product (IM1) (7). The 3' end of the IM1, cTPr1, is designed to form a stem-loop structure and DNA extension can be initiated at the 3' end by self-priming mechanism (8). iv. This event generates in total two different pathways (9). The one is self-priming from the complement of TPf1 (cTPf1) after stem-loop formation with thermal fluctuation (10). The other one occurs at the cTPr1 loop in the middle of the IM1 where another reverse TP (TPr4) can hybridize there (11) . v. Products are continuously amplified by the same mechanism as described above.
doi:10.1093/nar/gkr041 pmid:21310714 pmcid:PMC3089485 fatcat:li4vvddyvzf43pjlq7ltdwc6pq

Integrative gene-metabolite network with implemented causality deciphers informational fluxes of sulphur stress response

Victoria J. Nikiforova, Carsten O. Daub, Holger Hesse, Lothar Willmitzer, Rainer Hoefgen
2005 Journal of Experimental Botany  
(Daub et al., 2004) .  ...  For n values in a row an over-sampling rate of o exchanges n3o pairs of values.  ... 
doi:10.1093/jxb/eri179 pmid:15911562 fatcat:f7ub5kogozhvzjzlxw7zig3yjy

Chromatin states reveal functional associations for globally defined transcription start sites in four human cell lines

Morten Rye, Geir Sandve, Carsten O Daub, Hideya Kawaji, Piero Carninci, Alistair RR Forrest, Finn Drabløs
2014 BMC Genomics  
Deciphering the most common modes by which chromatin regulates transcription, and how this is related to cellular status and processes is an important task for improving our understanding of human cellular biology. The FANTOM5 and ENCODE projects represent two independent large scale efforts to map regulatory and transcriptional features to the human genome. Here we investigate chromatin features around a comprehensive set of transcription start sites in four cell lines by integrating data from
more » ... these two projects. Results: Transcription start sites can be distinguished by chromatin states defined by specific combinations of both chromatin mark enrichment and the profile shapes of these chromatin marks. The observed patterns can be associated with cellular functions and processes, and they also show association with expression level, location relative to nearby genes, and CpG content. In particular we find a substantial number of repressed inter-and intra-genic transcription start sites enriched for active chromatin marks and Pol II, and these sites are strongly associated with immediate-early response processes and cell signaling. Associations between start sites with similar chromatin patterns are validated by significant correlations in their global expression profiles. Conclusions: The results confirm the link between chromatin state and cellular function for expressed transcripts, and also indicate that active chromatin states at repressed transcripts may poise transcripts for rapid activation during immune response.
doi:10.1186/1471-2164-15-120 pmid:24669905 pmcid:PMC3986914 fatcat:tfwnyhfy2bafjic2btyka3jtyu
« Previous Showing results 1 — 15 out of 158 results