851 Hits in 2.1 sec

Online Resources for Genomic Structural Variation [chapter]

Tam P. Sneddon, Deanna M. Church
2011 Msphere  
Genomic structural variation (SV) can be thought of on a continuum from a single base pair insertion/deletion (INDEL) to large megabase-scale rearrangements involving insertions, deletions, duplications, inversions or translocations of whole chromosomes or chromosome arms. These variants can occur in coding or non-coding DNA, they can be inherited or arise sporadically in the germline or somatic cells. Many of these events are segregating in the population and can be considered common alleles
more » ... ile others are new alleles and thus rare events. All species studied to date harbor structural variants and these may be benign, contributing to phenotypes such as sensory perception and immunity, or pathogenic resulting in genomic disorders including DiGeorge/velocardiofacial, Smith-Margenis, Williams-Beuren and Prader-Willi syndromes. As structural variants are identified, validated and their significance, origin and prevalence elucidated it is of critical importance that this data be collected and collated in a way that can be easily accessed and analyzed. This chapter will describe current structural variation online resources (see Figure 1 and Table 1) , highlight the challenges in capturing, storing and displaying SV data, and discuss how dbVar and DGVa, the genomic structural variation databases developed at NCBI and EBI respectively, were designed to address these issues.
doi:10.1007/978-1-61779-507-7_13 pmid:22228017 pmcid:PMC3804003 fatcat:7wzdh7fknfa4jm5c6in326gbee

Spidey: A Tool for mRNA-to-Genomic Alignments

Sarah J. Wheelan, Deanna M. Church, James M. Ostell
2001 Genome Research  
We have developed a computer program that aligns spliced sequences to genomic sequences, using local alignment algorithms and heuristics to put together a global spliced alignment. Spidey can produce reliable alignments quickly, even when confronted with noise from alternative splicing, polymorphisms, sequencing errors, or evolutionary divergence. We show how Spidey was used to align reference sequences to known genomic sequences and then to the draft human genome, to align mRNAs to gene
more » ... s, and to align mouse mRNAs to human genomic sequence. We compared Spidey to two other spliced alignment programs; Spidey generally performed quite well in a very reasonable amount of time. 3
doi:10.1101/gr.195301 pmid:11691860 pmcid:PMC311166 fatcat:ibsx3vkkc5g7bjdwp6ovrnysxq

Thousands of human sequences provide deep insight into single genomes

Deanna M. Church
2020 Nature  
Deanna M.  ...  M.  ... 
doi:10.1038/d41586-020-01485-4 pmid:32461645 fatcat:jtendlcxcvhtfe4q7mux4rxgaq

Linked-Read sequencing resolves complex structural variants [article]

Sarah Garcia, Stephen Williams, Andrew Wei Xu, Jill Herschleb, Patrick Marks, David Stafford, Deanna M. Church
2017 bioRxiv   pre-print
Large genomic structural variants (>50bp) are important contributors to disease, yet they remain one of the most difficult types of variation to accurately ascertain, in part because they tend to cluster in duplicated and repetitive regions, but also because the various signals for these events can be challenging to detect with short reads. Clinically, aCGH and karyotype remain the most commonly used assays for genome-wide structural variant (SV) detection, though there is clear potential
more » ... t to an NGS-based assay that accurately detects both SVs and single nucleotide variants. Linked-Read sequencing is a relatively simple, fast, and cost-effective method that is applicable to both genome and targeted assays. Linked-Reads are generated by performing haplotype-level dilution of long input DNA molecules into >1 million barcoded partitions, generating barcoded short reads within those partitions, and then performing short read sequencing in bulk. We performed 30x Linked-Read genome sequencing on a set of 23 samples with known balanced or unbalanced SVs. Twenty-seven of the 29 known events were detected and another event was called as a candidate. Sequence downsampling was performed on a subset to determine the lowest sequence depth required to detect variations. Copy-number variants can be called with as little as 1-2x sequencing depth (5-10Gb) while balanced events require on the order of 10x coverage for variant calls to be made, although specific signal is clearly present at 1-2x sequencing depth. In addition to detecting a full spectrum of variant types with a single test, Linked-Read sequencing provides base-level resolution of breakpoints, enabling complete resolution of even the most complex chromosomal rearrangements.
doi:10.1101/231662 fatcat:f5mkdqfv5ne4le6wo4n5bckqoy

Back to Bermuda: how is science best served?

Deanna M Church, LaDeana W Hillier
2009 Genome Biology  
. • Church DM, Hillier LW: B Ba ac ck k t to o B Be er rm mu ud da a: : h ho ow w i is s s sc ci ie en nc ce e b be es st t s se er rv ve ed d? ? Genome Biol 2009, 1 10 0: :105.  ... 
doi:10.1186/gb-2009-10-4-105 pmid:19435531 pmcid:PMC2688919 fatcat:35gferz7lrefjessmxwh3iv67q

The Role of Structure Versus Individual Agency in Churches' Responses to HIV/AIDS: A Case Study of Baltimore City Churches

Shayna D. Cunningham, Deanna L. Kerrigan, Clea A. McNeely, Jonathan M. Ellen
2009 Journal of religion and health  
agency versus institutional forces influence churches in this regard.  ...  Church leaders varied, however, in the extent to which they responded in accordance with or resisted these constraints, highlighting the importance of individual agency influencing churches' responses  ...  individual churches.  ... 
doi:10.1007/s10943-009-9281-7 pmid:19714469 pmcid:PMC4862003 fatcat:r6l6pwn7info7l2vohiohu46bq

Direct determination of diploid genome sequences

Neil I. Weisenfeld, Vijay Kumar, Preyas Shah, Deanna M. Church, David B. Jaffe
2017 Genome Research  
one 10x library 56 102 113.2 2.7 15.4 2.2 13.6 10.0 1.3 0.5 1.7 E NA24385 Ashkenazi M one 10x library 56 120 106.4 4.2 15.1 2.6 0.00006 13.9 9.6 1.3 2.0 1.8 F HGP European M one  ...  7 PacBio libraries 71 4525.2 4.5 0.0 11.8 2.2 17.9 L NA24143 Ashkenazi F 2 PacBio libraries 30 1048.4 1.0 0.0 14.3 15.2 M YH Chinese M ~18,000 Fosmid pools and 6 fragment and jumping  ... 
doi:10.1101/gr.214874.116 pmid:28381613 pmcid:PMC5411770 fatcat:5th5tgrribhxxphtrvkfct3txy

Mouse segmental duplication and copy number variation

Xinwei She, Ze Cheng, Sebastian Zöllner, Deanna M Church, Evan E Eichler
2008 Nature Genetics  
Detailed analyses of the clone-based genome assembly reveal that the recent duplication content of mouse (4.94%) is now comparable to that of human (5.5%), in contrast to previous estimates from the whole-genome shotgun sequence assembly. The architecture of mouse and human genomes differ dramatically; most mouse duplications are organized into discrete clusters of tandem duplications that are depleted for genes/transcripts and enriched for LINE and LTR retroposons. We assessed copy-number
more » ... tion of the C57BL/6J duplicated regions within 15 mouse strains used for genetic association studies, sequencing, and the Mouse Phenome Project. We determined that over 60% of these basepairs are polymorphic between the strains (on average 20 Mbp of copy-number variable DNA between different mouse strains). Our data suggest that different mouse strains show comparable, if not greater, copy-number polymorphism when compared to human; however, such variation is more locally restricted. We show large and complex patterns of inter-strain copy-number variation restricted to large gene families associated with spermatogenesis, pregnancy, viviparity, pheromone signalling, and immune response.
doi:10.1038/ng.172 pmid:18500340 pmcid:PMC2574762 fatcat:hxy3janzbfbepgwyupc644rvzm

Alternate-locus aware variant calling in whole genome sequencing

Marten Jäger, Max Schubach, Tomasz Zemojtel, Knut Reinert, Deanna M. Church, Peter N. Robinson
2016 Genome Medicine  
For each match (M) block in the original alignment, we considered the sequence to be a seed sequence if the trimmed M block was longer than 50 nt (Additional file 1: Supplemental Figure S8 ).  ...  The alignments start with the Gap= flag, followed by several blocks consisting of a letter (M, I, or D) and a number (length), where (i) M indicates a matching region between ref and alt loci, potentially  ... 
doi:10.1186/s13073-016-0383-z pmid:27964746 pmcid:PMC5155401 fatcat:icnladc47jbepbeusxy7hu7edm

Extending reference assembly models

Deanna M Church, Valerie A Schneider, Karyn Steinberg, Michael C Schatz, Aaron R Quinlan, Chen-Shan Chin, Paul A Kitts, Bronwen Aken, Gabor T Marth, Michael M Hoffman, Javier Herrero, M Lisandra Mendoza (+2 others)
2015 Genome Biology  
Using SRPRISM [26] (with parameters [p] (force paired/unpaired search): false; [n] (maximum number of allowed errors): 6; [M] (maximum allowed memory usage): 2048.  ... 
doi:10.1186/s13059-015-0587-3 pmid:25651527 pmcid:PMC4305238 fatcat:252kfuuqpvbujibokrwuwznonu

A variant by any name: quantifying annotation discordance across tools and clinical databases [article]

Jennifer Yen, Sarah Garcia, Aldrin Montana, Jason Harris, Steven Chervitz, John West, Richard Chen, Deanna M Church
2016 bioRxiv   pre-print
Clinical genomic testing is dependent on the robust identification and reporting of variant-level information in relation to disease. With the shift to high-throughput sequencing, a major challenge for clinical diagnostics is the cross-identification of variants called on their genomic position to resources that rely on transcript- or protein-based descriptions. Methods: We evaluated the accuracy of three tools (SnpEff, Variant Effect Predictor and Variation Reporter) that generate transcript
more » ... d protein-based variant nomenclature from genomic coordinates according to guidelines by the Human Genome Variation Society (HGVS). Our evaluation was based on comparisons to a manually-curated list of 127 test variants of various types drawn from data sources, each with HGVS-compliant transcript and protein descriptors. We further evaluated the concordance between annotations generated by Snpeff and Variant Effect Predictor with those in major germline and cancer databases: ClinVar and COSMIC, respectively. Results: We find that there is substantial discordance between the annotation tools and databases in the description of insertion and/or deletions. Accuracy based on our ground truth set was between 80-90% for coding and 50-70% for protein variants, numbers that are not adequate for clinical reporting. Exact concordance for SNV syntax was over 99.5% between ClinVar and Variant Effect Predictor (VEP) and SnpEff, but less than 90% for non-SNV variants. For COSMIC, exact concordance for coding and protein SNVs were between 65 and 88%, and less than 15% for insertions. Across the tools and datasets, there was a wide range of equivalent expressions describing protein variants. Conclusion: Our results reveal significant inconsistency in variant representation across tools and databases. These results highlight the urgent need for the adoption and adherence to uniform standards in variant annotation, with consistent reporting on the genomic reference, to enable accurate and efficient data-driven clinical care.
doi:10.1101/054023 fatcat:z767thrcuzcethzmnbegswjrg4

Building and Improving Reference Genome Assemblies

Karyn Meltz Steinberg, Valerie A. Schneider, Can Alkan, Michael J. Montague, Wesley C. Warren, Deanna M. Church, Richard K. Wilson
2017 Proceedings of the IEEE  
symbols, where edges represent a size k-1 overlaps between strings generated by an m -symbol alphabet.  ...  In genome assembly problem, m = 4 (i.e., Σ = { A, C, G, T} ) and k is the length of k -mers extracted from sequence reads.  ... 
doi:10.1109/jproc.2016.2645402 fatcat:6mm2qx33fjecxhcbrxnhaebfji

ClinVar: public archive of relationships among sequence variation and human phenotype

Melissa J. Landrum, Jennifer M. Lee, George R. Riley, Wonhee Jang, Wendy S. Rubinstein, Deanna M. Church, Donna R. Maglott
2013 Nucleic Acids Research  
ClinVar ( provides a freely available archive of reports of relationships among medically important variants and phenotypes. ClinVar accessions submissions reporting human variation, interpretations of the relationship of that variation to human health and the evidence supporting each interpretation. The database is tightly coupled with dbSNP and dbVar, which maintain information about the location of variation on human assemblies. ClinVar is also based on
more » ... e phenotypic descriptions maintained in MedGen ( Each ClinVar record represents the submitter, the variation and the phenotype, i.e. the unit that is assigned an accession of the format SCV000000000.0. The submitter can update the submission at any time, in which case a new version is assigned. To facilitate evaluation of the medical importance of each variant, ClinVar aggregates submissions with the same variation/phenotype combination, adds value from other NCBI databases, assigns a distinct accession of the format RCV000000000.0 and reports if there are conflicting clinical interpretations. Data in ClinVar are available in multiple formats, including html, download as XML, VCF or tab-delimited subsets. Data from ClinVar are provided as annotation tracks on genomic RefSeqs and are used in tools such as Variation Reporter (http://www.ncbi.nlm.nih. gov/variation/tools/reporter), which reports what is known about variation based on user-supplied locations.
doi:10.1093/nar/gkt1113 pmid:24234437 pmcid:PMC3965032 fatcat:ons2vfjgnfdr3hjadgnu6rki7a

Candidate Single Nucleotide Polymorphism Selection using Publicly Available Tools: A Guide for Epidemiologists

Parveen Bhatti, Deanna M. Church, Joni L. Rutter, Jeffery P. Struewing, Alice J. Sigurdson
2006 American Journal of Epidemiology  
Single nucleotide polymorphisms (SNPs) are the most common form of human genetic variation, with millions present in the human genome. Because only 1% might be expected to confer more than modest individual effects in association studies, the selection of predictive candidate variants for complex disease analyses is formidable. Technologic advances in SNP discovery and the ever-changing annotation of the genome have led to massive informational resources that can be difficult to master across
more » ... sciplines. A simplified guide is needed. Although methods for evaluating nonsynonymous coding SNPs are known, several other publicly available computational tools can be utilized to assess polymorphic variants in noncoding regions. As an example, the authors applied multiple methods to select SNPs in DNA double-strand break repair genes. They chose to evaluate SNPs that occurred among a preexisting set of 57 validated assays and to justify new assay development for 83 potential SNPs in the DNA-dependent protein kinase catalytic subunit. Of the 140 SNPs, the authors eliminated 119 variants with low or neutral predictions. The existing computational methods they used and the semiquantitative relative ranking strategy they developed can be adapted to a priori SNP selection or post hoc evaluation of variants identified in whole genome scans or within haplotype blocks associated with disease. The authors show a "real world" application of some existing bioinformatics tools for use in large epidemiologic studies and genetic analyses. They also reviewed alternative approaches that provide related information. amino acid sequence; base sequence; epidemiologic methods; genetic predisposition to disease; polymorphism, single nucleotide Abbreviations: DNAPKcs, catalytic subunit of the DNA protein kinase; ESE, exonic splicing enhancer; mRNA, messenger
doi:10.1093/aje/kwj269 pmid:16923772 fatcat:32oy7p6o7bg3fc77d2kzlkwwxy

Shotgun sequence assembly and recent segmental duplications within the human genome

Xinwei She, Zhaoshi Jiang, Royden A. Clark, Ge Liu, Ze Cheng, Eray Tuzun, Deanna M. Church, Granger Sutton, Aaron L. Halpern, Evan E. Eichler
2004 Nature  
Complex eukaryotic genomes are now being sequenced at an accelerated pace primarily using whole-genome shotgun (WGS) sequence assembly approaches. WGS assembly was initially criticized because of its perceived inability to resolve repeat structures within genomes. Here, we quantify the effect of WGS sequence assembly on large, highly similar repeats by comparison of the segmental duplication content of two different human genome assemblies. Our analysis shows that large (>15 kilobases) and
more » ... y identical (>97%) duplications are not adequately resolved by WGS assembly. This leads to significant reduction in genome length and the loss of genes embedded within duplications. Comparable analyses of mouse genome assemblies confirm that strict WGS sequence assembly will oversimplify our understanding of mammalian genome structure and evolution; a hybrid strategy using a targeted clone-by-clone approach to resolve duplications is proposed.
doi:10.1038/nature03062 pmid:15496912 fatcat:moxrqgowenbnjnx5six5fw65iu
« Previous Showing results 1 — 15 out of 851 results