Filters








27 Hits in 1.1 sec

Ten quick tips for biocuration

Y. Amy Tang, Klemens Pichler, Anja Füllgrabe, Jane Lomax, James Malone, Monica C. Munoz-Torres, Drashtti V. Vasant, Eleanor Williams, Melissa Haendel, Francis Ouellette
2019 PLoS Computational Biology  
doi:10.1371/journal.pcbi.1006906 pmid:31048830 pmcid:PMC6497217 fatcat:anuuznr4a5dzxdr6r6o4i6vcs4

From ArrayExpress to BioStudies

Ugis Sarkans, Anja Füllgrabe, Ahmed Ali, Awais Athar, Ehsan Behrangi, Nestor Diaz, Silvie Fexova, Nancy George, Haider Iqbal, Sandeep Kurri, Jhoan Munoz, Juan Rada (+2 others)
2020 Nucleic Acids Research  
ArrayExpress (https://www.ebi.ac.uk/arrayexpress) is an archive of functional genomics data at EMBL-EBI, established in 2002, initially as an archive for publication-related microarray data and was later extended to accept sequencing-based data. Over the last decade an increasing share of biological experiments involve multiple technologies assaying different biological modalities, such as epigenetics, and RNA and protein expression, and thus the BioStudies database
more » ... dies) was established to deal with such multimodal data. Its central concept is a study, which typically is associated with a publication. BioStudies stores metadata describing the study, provides links to the relevant databases, such as European Nucleotide Archive (ENA), as well as hosts the types of data for which specialized databases do not exist. With BioStudies now fully functional, we are able to further harmonize the archival data infrastructure at EMBL-EBI, and ArrayExpress is being migrated to BioStudies. In future, all functional genomics data will be archived at BioStudies. The process will be seamless for the users, who will continue to submit data using the online tool Annotare and will be able to query and download data largely in the same manner as before. Nevertheless, some technical aspects, particularly programmatic access, will change. This update guides the users through these changes.
doi:10.1093/nar/gkaa1062 pmid:33211879 fatcat:p2nyf6ayibhvhif7hcsye7fu2a

ArrayExpress update – from bulk to single-cell expression data

Awais Athar, Anja Füllgrabe, Nancy George, Haider Iqbal, Laura Huerta, Ahmed Ali, Catherine Snow, Nuno A Fonseca, Robert Petryszak, Irene Papatheodorou, Ugis Sarkans, Alvis Brazma
2018 Nucleic Acids Research  
ArrayExpress (https://www.ebi.ac.uk/arrayexpress) is an archive of functional genomics data from a variety of technologies assaying functional modalities of a genome, such as gene expression or promoter occupancy. The number of experiments based on sequencing technologies, in particular RNA-seq experiments, has been increasing over the last few years and submissions of sequencing data have overtaken microarray experiments in the last 12 months. Additionally, there is a significant increase in
more » ... periments investigating single cells, rather than bulk samples, known as single-cell RNA-seq. To accommodate these trends, we have substantially changed our submission tool Annotare which, along with raw and processed data, collects all metadata necessary to interpret these experiments. Selected datasets are re-processed and loaded into our sister resource, the value-added Expression Atlas (and its component Single Cell Expression Atlas), which not only enables users to interpret the data easily but also serves as a test for data quality. With an increasing number of studies that combine different assay modalities (multi-omics experiments), a new more general archival resource the BioStudies Database has been developed, which will eventually supersede ArrayExpress. Data submissions will continue unchanged; all existing ArrayExpress data will be incorporated into BioStudies and the existing accession numbers and application programming interfaces will be maintained.
doi:10.1093/nar/gky964 pmid:30357387 pmcid:PMC6323929 fatcat:p2n5pu6dgffvtf3jp5ypftowxu

The RNASeq-er API—a gateway to systematically updated analysis of public RNA-seq data

Robert Petryszak, Nuno A Fonseca, Anja Füllgrabe, Laura Huerta, Maria Keays, Y Amy Tang, Alvis Brazma, Bonnie Berger
2017 Bioinformatics  
Motivation: The exponential growth of publicly available RNA-sequencing (RNA-Seq) data poses an increasing challenge to researchers wishing to discover, analyse and store such data, particularly those based in institutions with limited computational resources. EMBL-EBI is in an ideal position to address these challenges and to allow the scientific community easy access to not just raw, but also processed RNA-Seq data. We present a Web service to access the results of a systematically and
more » ... ally updated standardized alignment as well as gene and exon expression quantification of all public bulk (and in the near future also single-cell) RNA-Seq runs in 264 species in European Nucleotide Archive, using Representational State Transfer. Results: The RNASeq-er API (Application Programming Interface) enables ontology-powered search for and retrieval of CRAM, bigwig and bedGraph files, gene and exon expression quantification matrices (Fragments Per Kilobase Of Exon Per Million Fragments Mapped, Transcripts Per Million, raw counts) as well as sample attributes annotated with ontology terms. To date over 270 00 RNA-Seq runs in nearly 10 000 studies (1PB of raw FASTQ data) in 264 species in ENA have been processed and made available via the API. Availability and Implementation: The RNASeq-er API can be accessed at
doi:10.1093/bioinformatics/btx143 pmid:28369191 pmcid:PMC5870697 fatcat:ild6vn5ll5hcfj4alj5kksj234

Guidelines for reporting single-cell RNA-Seq experiments [article]

Anja Füllgrabe, Nancy George, Matthew Green, Parisa Nejad, Bruce Aronow, Laura Clarke, Silvie Korena Fexova, Clay Fischer, Mallory Ann Freeberg, Laura Huerta, Norman Morrison, Richard H. Scheuermann (+6 others)
2019 arXiv   pre-print
Single-cell RNA-Sequencing (scRNA-Seq) has undergone major technological advances in recent years, enabling the conception of various organism-level cell atlassing projects. With increasing numbers of datasets being deposited in public archives, there is a need to address the challenges of enabling the reproducibility of such data sets. Here, we describe guidelines for a minimum set of metadata to sufficiently describe scRNA-Seq experiments, ensuring reproducibility of data analyses.
arXiv:1910.14623v1 fatcat:thmk5zfofrbkhfpv33znev2cym

A high-resolution mRNA expression time course of embryonic development in zebrafish [article]

Richard J White, John E Collins, Ian M Sealy, Neha Wali, Christopher M Dooley, Zsofia Digby, Derek L Stemple, Daniel N Murphy, Thibaut Hourlier, Anja Fullgrabe, Matthew P Davis, Anton J Enright (+1 others)
2017 bioRxiv   pre-print
We have produced an mRNA expression time course of zebrafish development across 18 time points from 1-cell to 5 days post-fertilisation sampling individual and pools of embryos. Using poly(A) pulldown stranded RNA-seq and a 3′ end transcript counting method we characterise the temporal expression profiles of 23,642 genes. We identify temporal and functional transcript co-variance that associates 5,024 unnamed genes with distinct developmental time points. Specifically, a class of over 100
more » ... usly uncharacterised zinc finger domain containing genes, located on the long arm of chromosome 4, is expressed in a sharp peak during zygotic genome activation. The data reveal complex and widespread differential use of exons and previously unidentified 3′ ends across development, new primary microRNA transcripts and temporal divergence of gene paralogues generated in the teleost genome duplication. To make this dataset a useful baseline reference, the data are accessible to browse and download at Expression Atlas and Ensembl.
doi:10.1101/107631 fatcat:crrp2axbqrcrvhjrh5qfssvn5e

Dynamics of Lgr6 + Progenitor Cells in the Hair Follicle, Sebaceous Gland, and Interfollicular Epidermis

Anja Füllgrabe, Simon Joost, Alexandra Are, Tina Jacob, Unnikrishnan Sivan, Andrea Haegebarth, Sten Linnarsson, Benjamin D. Simons, Hans Clevers, Rune Toftgård, Maria Kasper
2015 Stem Cell Reports  
The dynamics and interactions between stem cell pools in the hair follicle (HF), sebaceous gland (SG), and interfollicular epidermis (IFE) of murine skin are still poorly understood. In this study, we used multicolor lineage tracing to mark Lgr6-expressing basal cells in the HF isthmus, SG, and IFE. We show that these Lgr6 + cells constitute long-term self-renewing populations within each compartment in adult skin. Quantitative analysis of clonal dynamics revealed that the Lgr6 + progenitor
more » ... s compete neutrally in the IFE, isthmus, and SG, indicating population asymmetry as the underlying mode of tissue renewal. Transcriptional profiling of Lgr6 + and Lgr6 À cells did not reveal a distinct Lgr6-associated gene expression signature, raising the question of whether Lgr6 expression requires extrinsic niche signals. Our results elucidate the interrelation and behavior of Lgr6 + populations in the IFE, HF, and SG and suggest population asymmetry as a common mechanism for homeostasis in several epithelial skin compartments.
doi:10.1016/j.stemcr.2015.09.013 pmid:26607954 pmcid:PMC4649262 fatcat:z2vmfktsujhvvbgugklub55yxy

A high-resolution mRNA expression time course of embryonic development in zebrafish

Richard J White, John E Collins, Ian M Sealy, Neha Wali, Christopher M Dooley, Zsofia Digby, Derek L Stemple, Daniel N Murphy, Konstantinos Billis, Thibaut Hourlier, Anja Füllgrabe, Matthew P Davis (+2 others)
2017 eLife  
We have produced an mRNA expression time course of zebrafish development across 18 time points from 1 cell to 5 days post-fertilisation sampling individual and pools of embryos. Using poly(A) pulldown stranded RNA-seq and a 3′ end transcript counting method we characterise temporal expression profiles of 23,642 genes. We identify temporal and functional transcript co-variance that associates 5024 unnamed genes with distinct developmental time points. Specifically, a class of over 100 previously
more » ... uncharacterised zinc finger domain containing genes, located on the long arm of chromosome 4, is expressed in a sharp peak during zygotic genome activation. In addition, the data reveal new genes and transcripts, differential use of exons and previously unidentified 3′ ends across development, new primary microRNAs and temporal divergence of gene paralogues generated in the teleost genome duplication. To make this dataset a useful baseline reference, the data can be browsed and downloaded at Expression Atlas and Ensembl.
doi:10.7554/elife.30860 pmid:29144233 pmcid:PMC5690287 fatcat:kfjyomsqkvdvzntjeuxyjvye2e

Online resources for PCAWG data exploration, visualization, and discovery [article]

Mary Goldman, Junjun Zhang, Nuno A. Fonseca, Qian Xiang, Brian Craft, Elena Piñeiro-Yáñez, Brian O'Connor, Wojciech Bazant, Elisabet Barrera, Alfonso Muñoz, Robert Petryszak, Anja Füllgrabe (+10 others)
2017 bioRxiv   pre-print
The Pan-Cancer Analysis of Whole Genomes (PCAWG) cohort provides a large, uniformly-analyzed, whole-genome dataset. The PCAWG Landing Page (http://docs.icgc.org/pcawg) focuses on four biologist-friendly, publicly-available web tools for exploring this data: The ICGC Data Portal, UCSC Xena, Expression Atlas, and PCAWG-Scout. They enable researchers to dynamically query the complex genomics data, explore tumors' molecular landscapes, and include external information to facilitate interpretation.
doi:10.1101/163907 fatcat:g2epyq2jsfazhkofwpp7gapx3u

Expression Atlas: gene and protein expression across multiple studies and organisms

Irene Papatheodorou, Nuno A Fonseca, Maria Keays, Y Amy Tang, Elisabet Barrera, Wojciech Bazant, Melissa Burke, Anja Füllgrabe, Alfonso Muñoz-Pomer Fuentes, Nancy George, Laura Huerta, Satu Koskinen (+10 others)
2017 Nucleic Acids Research  
Expression Atlas (http://www.ebi.ac.uk/gxa) is an added value database that provides information about gene and protein expression in different species and contexts, such as tissue, developmental stage, disease or cell type. The available public and controlled access data sets from different sources are curated and re-analysed using standardized, open source pipelines and made available for queries, download and visualization. As of August 2017, Expression Atlas holds data from 3,126 studies
more » ... oss 33 different species, including 731 from plants. Data from large-scale RNA sequencing studies including Blueprint, PCAWG, ENCODE, GTEx and HipSci can be visualized next to each other. In Expression Atlas, users can query genes or gene-sets of interest and explore their expression across or within species, tissues, developmental stages in a constitutive or differential context, representing the effects of diseases, conditions or experimental interventions. All processed data matrices are available for direct download in tab-delimited format or as Rdata. In addition to the web interface, data sets can now be searched and downloaded through the Expression Atlas R package. Novel features and visualizations include the on-the-fly analysis of gene set overlaps and the option to view gene co-expression in experiments investigating constitutive gene expression across tissues or other conditions.
doi:10.1093/nar/gkx1158 pmid:29165655 pmcid:PMC5753389 fatcat:duqk25kctbauzihb22ieukvofa

A user guide for the online exploration and visualization of PCAWG data

Mary J. Goldman, Junjun Zhang, Nuno A. Fonseca, Isidro Cortés-Ciriano, Qian Xiang, Brian Craft, Elena Piñeiro-Yáñez, Brian D. O'Connor, Wojciech Bazant, Elisabet Barrera, Alfonso Muñoz-Pomer, Robert Petryszak (+12 others)
2020 Nature Communications  
The Pan-Cancer Analysis of Whole Genomes (PCAWG) project generated a vast amount of whole-genome cancer sequencing resource data. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumor types, we provide a user's guide to the five publicly available online data exploration and visualization tools introduced in the PCAWG marker paper. These tools are ICGC Data Portal, UCSC Xena,
more » ... hripsis Explorer, Expression Atlas, and PCAWG-Scout. We detail use cases and analyses for each tool, show how they incorporate outside resources from the larger genomics ecosystem, and demonstrate how the tools can be used together to understand the biology of cancers more deeply. Together, the tools enable researchers to query the complex genomic PCAWG data dynamically and integrate external information, enabling and enhancing interpretation.
doi:10.1038/s41467-020-16785-6 pmid:32636365 fatcat:t3fvqfoo65fqxnp3jvubovuvf4

Common and distinct transcriptional signatures of mammalian embryonic lethality

John E. Collins, Richard J. White, Nicole Staudt, Ian M. Sealy, Ian Packham, Neha Wali, Catherine Tudor, Cecilia Mazzeo, Angela Green, Emma Siragher, Edward Ryder, Jacqueline K. White (+14 others)
2019 Nature Communications  
The Deciphering the Mechanisms of Developmental Disorders programme has analysed the morphological and molecular phenotypes of embryonic and perinatal lethal mouse mutant lines in order to investigate the causes of embryonic lethality. Here we show that individual whole-embryo RNA-seq of 73 mouse mutant lines (>1000 transcriptomes) identifies transcriptional events underlying embryonic lethality and associates previously uncharacterised genes with specific pathways and tissues. For example, our
more » ... data suggest that Hmgxb3 is involved in DNA-damage repair and cell-cycle regulation. Further, we separate embryonic delay signatures from mutant line-specific transcriptional changes by developing a baseline mRNA expression catalogue of wild-type mice during early embryogenesis (4-36 somites). Analysis of transcription outside coding sequence identifies deregulation of repetitive elements in Morc2a mutants and a gene involved in gene-specific splicing. Collectively, this work provides a large scale resource to further our understanding of early embryonic developmental disorders.
doi:10.1038/s41467-019-10642-x pmid:31243271 pmcid:PMC6594971 fatcat:v627lgld6rhwlmsq3xm2p4kl74

Expression Atlas update—an integrated database of gene and protein expression in humans, animals and plants

Robert Petryszak, Maria Keays, Y. Amy Tang, Nuno A. Fonseca, Elisabet Barrera, Tony Burdett, Anja Füllgrabe, Alfonso Muñoz-Pomer Fuentes, Simon Jupp, Satu Koskinen, Oliver Mannion, Laura Huerta (+12 others)
2015 Nucleic Acids Research  
Expression Atlas (http://www.ebi.ac.uk/gxa) provides information about gene and protein expression in animal and plant samples of different cell types, organism parts, developmental stages, diseases and other conditions. It consists of selected microarray and RNA-sequencing studies from Array-Express, which have been manually curated, annotated with ontology terms, checked for high quality and processed using standardised analysis methods. Since the last update, Atlas has grown sevenfold (1572
more » ... tudies as of August 2015), and incorporates baseline expression profiles of tissues from Human Protein Atlas, GTEx and FANTOM5, and of cancer cell lines from ENCODE, CCLE and Genentech projects. Plant studies constitute a quarter of Atlas data. For genes of interest, the user can view baseline expression in tissues, and differential expression for biologically meaningful pairwise comparisons--estimated using consistent methodology across all of Atlas. Our first proteomics study in human tissues is now displayed alongside transcriptomics data in the same tissues. Novel analyses and visualisations include: 'enrichment' in each differential comparison of GO terms, Reactome, Plant Reactome pathways and InterPro domains; hierarchical clustering (by baseline expression) of most variable genes and experimental conditions; and, for a given gene-condition, distribution of baseline expression across biological replicates.
doi:10.1093/nar/gkv1045 pmid:26481351 pmcid:PMC4702781 fatcat:ejfwhmtlcvbz3guoi6pkyacyji

Expression Atlas update: gene and protein expression in multiple species

Pablo Moreno, Silvie Fexova, Nancy George, Jonathan R Manning, Zhichiao Miao, Suhaib Mohammed, Alfonso Muñoz-Pomer, Anja Fullgrabe, Yalan Bi, Natassja Bush, Haider Iqbal, Upendra Kumbham (+22 others)
2021 Nucleic Acids Research  
The EMBL-EBI Expression Atlas is an added value knowledge base that enables researchers to answer the question of where (tissue, organism part, developmental stage, cell type) and under which conditions (disease, treatment, gender, etc) a gene or protein of interest is expressed. Expression Atlas brings together data from >4500 expression studies from >65 different species, across different conditions and tissues. It makes these data freely available in an easy to visualise form, after
more » ... ert curation to accurately represent the intended experimental design, re-analysed via standardised pipelines that rely on open-source community developed tools. Each study's metadata are annotated using ontologies. The data are re-analyzed with the aim of reproducing the original conclusions of the underlying experiments. Expression Atlas is currently divided into Bulk Expression Atlas and Single Cell Expression Atlas. Expression Atlas contains data from differential studies (microarray and bulk RNA-Seq) and baseline studies (bulk RNA-Seq and proteomics), whereas Single Cell Expression Atlas is currently dedicated to Single Cell RNA-Sequencing (scRNA-Seq) studies. The resource has been in continuous development since 2009 and it is available at https://www.ebi.ac.uk/gxa.
doi:10.1093/nar/gkab1030 pmid:34850121 pmcid:PMC8728300 fatcat:w4ochry6hreqhjcqkaezuperpe

A proteomics sample metadata representation for multiomics integration, and big data analysis [article]

Chengxin Dai, Anja Fullgrabe, Julianus Pfeuffer, Elizaveta Solovyeva, Jingwen Deng, Pablo Moreno, Selvakumar Kamatchinathan, Deepti Jaiswal Kundu, Nancy George, Silvie Fexova, Bjorn Gruning, Melanie Christine Foll (+28 others)
2021 bioRxiv   pre-print
The amount of public proteomics data is increasing at an extraordinary rate. Hundreds of datasets are submitted each month to ProteomeXchange repositories, representing many types of proteomics studies, focusing on different aspects such as quantitative experiments, post-translational modifications, protein-protein interactions, or subcellular localization, among many others. For every proteomics dataset, two levels of data are captured: the dataset description, and the data files (encoded in
more » ... fferent file formats). Whereas the dataset description and data file formats are supported by all ProteomeXchange partner repositories, there is no standardized format to properly describe the sample metadata and their relationship with the dataset files in a way that fully allows their understanding or re-analysis. It is left to the users choice whether to provide or not an ad hoc document containing this information. Therefore, in many cases, understanding the study design and data requires going back to the associated publication. This can be tedious and may be restricted in the case of non-open access publications. In many cases, this problem limits the generalization and reuse of public proteomics data. Here we present a standard representation for sample metadata tailored to proteomics datasets produced by the HUPO Proteomics Standards Initiative and supported by ProteomeXchange resources. We repurposed the existing data format MAGE-TAB used routinely in the transcriptomics field to represent and annotate proteomics datasets. MAGE-TAB-Proteomics defines a set of annotation rules that the datasets submitted to ProteomeXchange should follow, ranging from sample properties to data analysis protocols. We also introduce a crowdsourcing project that enabled the manual curation of over 200 public datasets using MAGE-TAB-Proteomics. In addition, we describe an ecosystem of tools and libraries that were developed to validate and submit sample metadata-related information to ProteomeXchange. We expect that these tools will improve the reproducibility of published results and facilitate the reanalysis and integration of public proteomics datasets.
doi:10.1101/2021.05.21.445143 fatcat:trr3b4gygjcppdawdz546rgp6q
« Previous Showing results 1 — 15 out of 27 results