Filters








118 Hits in 0.92 sec

Hotspots of mammalian chromosomal evolution

Jeffrey A Bailey, Robert Baertsch, W James Kent, David Haussler, Evan E Eichler
2004 Genome Biology  
Chromosomal evolution is thought to occur through a random process of breakage and rearrangement that leads to karyotype differences and disruption of gene order. With the availability of both the human and mouse genomic sequences, detailed analysis of the sequence properties underlying these breakpoints is now possible. We report an abundance of primate-specific segmental duplications at the breakpoints of syntenic blocks in the human genome. Using conservative criteria, we find that 25%
more » ... 61) of all breakpoints contain > or = 10 kb of duplicated sequence. This association is highly significant (p < 0.0001) when compared to a simulated random-breakage model. The significance is robust under a variety of parameters, multiple sets of conserved synteny data, and for orthologous breakpoints between and within chromosomes. A comparison of mouse lineage-specific breakpoints since the divergence of rat and mouse showed a similar association with regions associated with segmental duplications in the primate genome. These results indicate that segmental duplications are associated with syntenic rearrangements, even when pericentromeric and subtelomeric regions are excluded. However, segmental duplications are not necessarily the cause of the rearrangements. Rather, our analysis supports a nonrandom model of chromosomal evolution that implicates specific regions within the mammalian genome as having been predisposed to both recurrent small-scale duplication and large-scale evolutionary rearrangements.
doi:10.1186/gb-2004-5-4-r23 pmid:15059256 pmcid:PMC395782 fatcat:s2khurcm35cmpnlngr2dri3pqq

Speciation network in Laurasiatheria: retrophylogenomic signals

Liliya Doronina, Gennady Churakov, Andrej Kuritzin, Jingjing Shi, Robert Baertsch, Hiram Clawson, Jürgen Schmitz
2017 Genome Research  
doi:10.1101/gr.210948.116 pmid:28298429 pmcid:PMC5453332 fatcat:3sbvzrey3fevnftz5ki6edudgq

Prototyping a precision oncology 3.0 rapid learning platform

Connor Sweetnam, Simone Mocellin, Michael Krauthammer, Nathaniel Knopf, Robert Baertsch, Jeff Shrager
2018 BMC Bioinformatics  
We describe a prototype implementation of a platform that could underlie a Precision Oncology Rapid Learning system. Results: We describe the prototype platform, and examine some important issues and details. In the Appendix we provide a complete walk-through of the prototype platform. Conclusions: The design choices made in this implementation rest upon ten constitutive hypotheses, which, taken together, define a particular view of how a rapid learning medical platform might be defined, organized, and implemented.
doi:10.1186/s12859-018-2374-0 fatcat:nujcim7ptne6rgwodrxwdiutxu

Retrocopy contributions to the evolution of the human genome

Robert Baertsch, Mark Diekhans, W James Kent, David Haussler, Jürgen Brosius
2008 BMC Genomics  
Evolution via point mutations is a relatively slow process and is unlikely to completely explain the differences between primates and other mammals. By contrast, 45% of the human genome is composed of retroposed elements, many of which were inserted in the primate lineage. A subset of retroposed mRNAs (retrocopies) shows strong evidence of expression in primates, often yielding functional retrogenes. Results: To identify and analyze the relatively recently evolved retrogenes, we carried out
more » ... TZ alignments of all human mRNAs against the human genome and scored a set of features indicative of retroposition. Of over 12,000 putative retrocopy-derived genes that arose mainly in the primate lineage, 726 with strong evidence of transcript expression were examined in detail. These mRNA retroposition events fall into three categories: I) 34 retrocopies and antisense retrocopies that added potential protein coding space and UTRs to existing genes; II) 682 complete retrocopy duplications inserted into new loci; and III) an unexpected set of 13 retrocopies that contributed out-of-frame, or antisense sequences in combination with other types of transposed elements (SINEs, LINEs, LTRs), even unannotated sequence to form potentially novel genes with no homologs outside primates. In addition to their presence in human, several of the gene candidates also had potentially viable ORFs in chimpanzee, orangutan, and rhesus macaque, underscoring their potential of function. Conclusion: mRNA-derived retrocopies provide raw material for the evolution of genes in a wide variety of ways, duplicating and amending the protein coding region of existing genes as well as generating the potential for new protein coding space, or non-protein coding RNAs, by unexpected contributions out of frame, in reverse orientation, or from previously non-protein coding sequence.
doi:10.1186/1471-2164-9-466 pmid:18842134 pmcid:PMC2584115 fatcat:4ovjfjtefjh7ta2ieeccqf5cf4

Pathway-Based Genomics Prediction using Generalized Elastic Net

Artem Sokolov, Daniel E. Carlin, Evan O. Paull, Robert Baertsch, Joshua M. Stuart, Teresa M. Przytycka
2016 PLoS Computational Biology  
We present a novel regularization scheme called The Generalized Elastic Net (GELnet) that incorporates gene pathway information into feature selection. The proposed formulation is applicable to a wide variety of problems in which the interpretation of predictive features using known molecular interactions is desired. The method naturally steers solutions toward sets of mechanistically interlinked genes. Using experiments on synthetic data, we demonstrate that pathway-guided results maintain,
more » ... often improve, the accuracy of predictors even in cases where the full gene network is unknown. We apply the method to predict the drug response of breast cancer cell lines. GELnet is able to reveal genetic determinants of sensitivity and resistance for several compounds. In particular, for an EGFR/HER2 inhibitor, it finds a possible trans-differentiation resistance mechanism missed by the corresponding pathway agnostic approach. Author Summary The low costs of sequencing and other high-throughput technologies have made available large amounts of data to address molecular biology problems. However, often this means thousands of measurements, for example on gene expression, are assayed for a much smaller number of samples. The imbalance complicates the identification of genes that generalize to new samples and in finding a collection of genes that suggest a theme for interpreting the data. Pathway and network-based approaches have proven their worth in these situations. They force solutions onto known biology and they produce more robust predictors. In this manuscript, we describe a new formulation of statistical learning approaches that naturally incorporates gene-gene relationships, like those found in gene network databases. The theory we present helps unify and codify an explicit formulation for gene pathway-informed machine-learning that should have wide reach. PLOS Computational Biology | Fig 5. Models for sensitivity of Gray cell lines to BIBW2992 learned by GELnets. We present the model weights for the top 30 genes (see text) as they change across the different values of the λ 1 and λ 2 meta-parameters. Higher positive (red) weights are associated with resistance, while higher negative (blue) weights are associated with sensitivity. The models are sorted by their % RMSE improvement over the corresponding Elastic Net models. The barplot on the left displays the average weight of each gene across all meta-parameter values where the weight is not zero.
doi:10.1371/journal.pcbi.1004790 pmid:26960204 pmcid:PMC4784899 fatcat:ifdxumnv2rhzhciisaf4xbm3re

Exploring Massive Incomplete Lineage Sorting in Arctoids (Laurasiatheria, Carnivora)

Liliya Doronina, Gennady Churakov, Jingjing Shi, Jürgen Brosius, Robert Baertsch, Hiram Clawson, Jürgen Schmitz
2015 Molecular biology and evolution  
Freed from the competition of large raptors, Paleocene carnivores could expand their newly acquired habitats in search of prey. Such changing conditions might have led to their successful distribution and rapid radiation. Today, molecular evolutionary biologists are faced, however, with the consequences of such accelerated adaptive radiations, because they led to sequential speciation more rapidly than phylogenetic markers could be fixed. The repercussions being that current genealogies based
more » ... such markers are incongruent with species trees. Our aim was to explore such conflicting phylogenetic zones of evolution during the early arctoid radiation, especially to distinguish diagnostic from misleading phylogenetic signals, and to examine other carnivore-related speciation events. We applied a combination of high-throughput computational strategies to screen carnivore and related genomes in silico for randomly inserted retroposed elements that we then used to identify inconsistent phylogenetic patterns in the Arctoidea group, which is well known for phylogenetic discordances. Our combined retrophylogenomic and in vitro wet lab approach detected hundreds of carnivore-specific insertions, many of them confirming well-established splits or identifying and solving conflicting species distributions. Our systematic genome-wide screens for Long INterspersed Elements detected homoplasy-free markers with insertion-specific truncation points that we used to distinguish phylogenetically informative markers from conflicting signals. The results were independently confirmed by phylogenetic diagnostic Short INterspersed Elements. As statistical analysis ruled out ancestral hybridization, these doubly verified but still conflicting patterns were statistically determined to be genomic remnants from a time of ancestral incomplete lineage sorting that especially accompanied large parts of Arctoidea evolution.
doi:10.1093/molbev/msv188 pmid:26337548 fatcat:lol37vmf7jbi3kpesbsnr24tpm

The Structure of a Rigorously Conserved RNA Element within the SARS Virus Genome

Michael P Robertson, Haller Igel, Robert Baertsch, David Haussler, Manuel Ares, William G Scott, Marv Wickens
2004 PLoS Biology  
Citation: Robertson MP, Igel H, Baertsch R, Haussler D, Ares M, et al. (2004) The structure of a rigorously conserved RNA element within the SARS virus genome. PLoS Biol 3(1): e5.  ... 
doi:10.1371/journal.pbio.0030005 pmid:15630477 pmcid:PMC539059 fatcat:f2qjggqbvbbddf2xti6cyyz6xq

GeneHub-GEPIS: digital expression profiling for normal and cancer tissues based on an integrated gene database

Yan Zhang, Shiuh-Ming Luoh, Lawrence S. Hon, Robert Baertsch, William I. Wood, Zemin Zhang
2007 Nucleic Acids Research  
GeneHub-GEPIS is a web application that performs digital expression analysis in human and mouse tissues based on an integrated gene database. Using aggregated expressed sequence tag (EST) library information and EST counts, the application calculates the normalized gene expression levels across a large panel of normal and tumor tissues, thus providing rapid expression profiling for a given gene. The backend GeneHub component of the application contains pre-defined gene structures derived from
more » ... NA transcript sequences from major databases and includes extensive cross references for commonly used gene identifiers. ESTs are then linked to genes based on their precise genomic locations as determined by GMAP. This genome-based approach reduces incorrect matches between ESTs and genes, thus minimizing the noise seen with previous tools. In addition, the gene-centric design makes it possible to add several important features, including text searching capabilities, the ability to accept diverse input values, expression analysis for microRNAs, basic gene annotation, batch analysis, and linking between mouse and human genes. GeneHub-GEPIS is available at
doi:10.1093/nar/gkm381 pmid:17545196 pmcid:PMC1933245 fatcat:znnyvz2ez5aopb7aurvwkojygq

Using native and syntenically mapped cDNA alignments to improve de novo gene finding

Mario Stanke, Mark Diekhans, Robert Baertsch, David Haussler
2008 Computer applications in the biosciences : CABIOS  
Motivation: Computational annotation of protein coding genes in genomic DNA is a widely used and essential tool for analyzing newly sequenced genomes. However, current methods suffer from inaccuracy and do poorly with certain types of genes. Including additional sources of evidence of the existence and structure of genes can improve the quality of gene predictions. For many eukaryotic genomes, expressed sequence tags (ESTs) are available as evidence for genes. Related genomes that have been
more » ... enced, annotated, and aligned to the target genome provide evidence of existence and structure of genes. Results: We incorporate several different evidence sources into the gene finder AUGUSTUS. The sources of evidence are gene and transcript annotations from related species syntenically mapped to the target genome using TRANSMAP, evolutionary conservation of DNA, mRNA and ESTs of the target species, and retroposed genes. The predictions include alternative splice variants where evidence supports it. Using only ESTs we were able to correctly predict at least one splice form exactly correct in 57% of human genes. Also using evidence from other species and human mRNAs, this number rises to 77%. Syntenic mapping is well-suited to annotate genomes closely related to genomes that are already annotated or for which extensive transcript evidence is available. Native cDNA evidence is most helpful when the alignments are used as compound information rather than independent positionwise information. Availability: AUGUSTUS is open source and available at http:// augustus.gobics.de. The gene predictions for human can be browsed and downloaded at the UCSC Genome Browser (http://
doi:10.1093/bioinformatics/btn013 pmid:18218656 fatcat:b2yvaaeuj5dhpew4pjjr7o76dm

A Model for the Proteolipid Ring and Bafilomycin/Concanamycin-binding Site in the Vacuolar ATPase ofNeurospora crassa

Barry J. Bowman, Mary E. McCall, Robert Baertsch, Emma Jean Bowman
2006 Journal of Biological Chemistry  
Robert Metzenberg (California State University, Northridge, CA) provided both guidance and the required strains and plasmids.  ... 
doi:10.1074/jbc.m605532200 pmid:16912037 fatcat:kkrr6xu53fgrnhypavlox3my7a

Forces Shaping the Fastest Evolving Regions in the Human Genome

Katherine S. Pollard, Sofie R. Salama, Bryan King, Andrew kern, Tim Dreszer, Sol Katzman, Adam Siepel, Jakob Skou Pedersen, Gill Bejerano, Robert Baertsch, Kate R. Rosenbloom, Jim Kent (+1 others)
2005 PLoS Genetics  
We define a human diff as a base where (i) chimp, mouse, and rat have the same nucleotide, (ii) human has a different nucleotide, (iii) the ancestral consensus sequence is not in a CpG dinucleotide, and (iv) the chimp base is high quality (Phred score and in an 11 base window with no indels, no more than 2 human/chimp differences, and all Phred scores ). If the chimp base is not high quality, we refer to the base as a low quality human diff. The number of human diffs in an element can be
more » ... d to the expected number if the element were evolving neutrally in the human lineage: 0.67 diffs per 100bp, based on a genome-wide estimate of P(human≠chimp | chimp=mouse=rat). Table S3 contains the distribution of the number of human diffs in our set of 96% conserved regions. Nearly 24% of the regions have at least one human diff. In addition, they contain between 0 and 6 human indel events (mean=0.38), affecting between 0 and 28 bases (mean=0.52). 30 ≥ 25 ≥ S1.2 Model for nucleotide evolution. The methods we use to detect substitution rate acceleration all make use of a fitted molecular evolutionary model. For this purpose, we use a general reversible singlenucleotide model (REV) 1 with parameters estimated from a genome-wide data set of evolutionarily conserved bases constituting approximately of the human genome. These sites were identified in an independent analysis of a 17 species multiple alignment of human, chimp, macaque, mouse, rat, rabbit, cow, dog, armadillo, elephant, tenrec, opossum, chicken, frog, fugu, tetraodon, and zebrafish using the methods described in Siepel et al. 3% 2 . Using the 17 species alignments and a topology for their phylogenetic tree, we globally estimate the free model parameters (rate matrix, branch lengths) using the phyloFit program 3 . Summaries of the estimated parameters are given in Figure S1 . We call this model the CONS model. The REV model is the most general model for nucleotide substitution subject to the time-reversibility constraint. This model implies a particular parameterization of the rate matrix for nucleotide substitutions, which has four nucleotide frequencies and five rate parameters. Other parameterizations of the substitution rate matrix could be considered. However, we selected the REV model because of its general applicability and do not expect the results to depend heavily on this choice. Note that for simplicity we use a model that assumes independence between bases. Employing a context dependent model would allow us to easily model substitution patterns over two or three adjacent bases, alleviating the need to remove CpG dinucleotides from the data set. However, a context dependent model might present problems with estimation in this setting. It remains to be shown whether employing a more heavily parameterized model will in fact provide more power to detect human-specific changes. Our model for nucleotide evolution could be modified to include indels, allowing us to utilize rather than discard information from alignment columns with gaps. We explored the possibility of treating gaps as a fifth character and identified a number of genomic elements with more human-specific indels than expected (data not shown). One serious drawback to this approach, however, is that insertions and deletions often affect 1 more than one adjacent base, violating the assumption of independence between bases. Consequently, large indels are assigned a higher probability then they deserve. A solution would be to model indel events, rather than indel bases (see ref. 3 for some work on this problem). S1.3 Likelihood Ratio Test. LRT statistics. For each region, we compute the likelihood ratio test (LRT) statistic as follows. First, we fit two models to the multiple alignment data for the region. Both are scaled versions of the CONS model, a technique that avoids re-estimation all model parameters on a small amount of data. The null model has a single scale parameter representing a shortening (more conserved) or lengthening (less conserved) of all branches in the CONS tree. The alternative model has an additional parameter for the human branch, which is constrained to be . This extra parameter allows the human branch to be relatively longer (less conserved) than the branches in the rest of the tree. Both models are fit using the phyloFit function (phast library) with the --init-model and --scale-only options. The model with a human rate parameter is fit with the additional option --scale-subtree human:loss. The LRT statistic is the log ratio of the likelihood of the alternative model to that of the null model. Regions are ranked based on the magnitude of the LRT statistic, with larger values indicating more evidence for acceleration in human. The ranking on LRT statistics agrees well with other methods 1 ≥ . LRT p -values. It is also of interest to assign a measure of statistical significance to each LRT statistic. We compute empirical p -values by simulation from the CONS model. One million simulated data sets are generated using the phyloBoot program (phast library). These are of variable lengths (median=140, as in the observed data). For each simulated data set, the LRT statistic is computed as above. The distributions of these statistics are similar for different length elements, so we pool all simulated LRT statistics to form a single null distribution. For each observed LRT statistic, the empirical p -value is the proportion of simulated data sets with a larger LRT statistic. Note that the smallest p -value that can be estimated by this method ( here) depends on the number of simulated data sets. For observed LRT statistics that exceed all simulated LRT statistics, we can only say . Computational burden prevents more precise estimation. S1.4 Multiple comparisons. Because the genomic regions we study in this paper are on different chromosomes or are separated by significant distance we can assume independence of their nucleotide substitution processes. This assumed independence allows us to employ a simple multiple testing correction throughout this study, the Benjamini & Hochberg False Discovery Rate (FDR) controlling procedure 5 , which requires independence or weak dependence between tests. The smallest FDR adjusted p -value that we can compute in the LRT is . 4 4.5 − e 2 S1.5 Filtering. In order to illustrate the types of erroneous elements that would be found in the list of HARs if we did not perform filtering (Section 4.3), we describe the following elements that were removed from the analysis. One high probability element, hg17.chr13:22,408,812-22,408,911, has a paralog in chimp and human, but not the rodents. This element is eliminated because we could not determine conclusively (due to gaps in the chimp assembly) which chimp sequence should align to each human sequence. There are two relatively significant elements that contain multiple adjacent human changes: hg17.chr10: 127,180,121-127,180,192 and hg17.chr11:118,305,215-118,305,352. In both cases, recomputing the element ranking without the adjacent changes seriously reduces the overall significance of the element (regardless of which specific base is retained). Hence, we eliminate these elements from further study. For two elements, hg17.chrX:95,820,769-95,820,993 and hg17.chrX:95,820,618-95,820,729, both of which fall in an intron of the DIAPH2 (O60879) gene, the chimpanzee reads in NCBI in fact agree with the human sequence. This suggests that the chimp whole genome assembly may be incorrect at this position and the substitutions are primate specific, but not human-specific. Furthermore, the macaque sequence in these two elements agrees with human, supporting the chimp reads and not the chimp assembly. We believe that the chimpanzee sequence in the corresponding Contig #300019 is an assembly error, potentially caused by mouse contamination of the chimpanzee library. One high scoring element, hg17.chr18:74,236,384-74,236,609, is removed based on contradictory findings in our resequencing data. A 4bp deletion in the human genome relative to the chimp and rodent genomes is not found in any of the humans in the PDR panel. Furthermore, all reads in the NCBI trace repository also do not have the deletion. This suggests that it is either a rare mutation or an assembly or read error. Because the apparent human-specific changes in this element are explained by a shift in the alignment due to this questionable indel, we remove the element from the analysis. S1.6 Background substitution rates based on ENCODE data. Background substitution rates are estimated using 4-fold degenerate (4d) sites in the ENCODE regions 6 (http://www.genome.gov/10005107), which cover 1% of the human genome. We fit a REV substitution model 1 to 4d sites from all ENCODE regions and from the five ENCODE regions that fall in the last (distal) band of their chromosomes. The rate matrix and GC content parameters were adjusted to correct for known bias in 4d sites using genome-wide estimates from ancestral repeats. These regions have a similar distribution of distances to the chromosome end as the HAR elements, making their 4d sites a suitable data set to estimate background substitution rates near chromosome ends. For each fitted model, we compute the posterior expected value of the number of substitutions on each lineage with the program phyloP with option --subtree (phast library). The background rates in the human-chimp tree are compared to the estimated chimp and human rates in the HAR elements.
doi:10.1371/journal.pgen.0020168.eor fatcat:ro4eg7e7r5bm5aza6psmvq35su

Forces Shaping the Fastest Evolving Regions in the Human Genome

Katherine S. Pollard, Sofie R. Salama, Bryan King, Andrew D. Kern, Tim Dreszer, Sol Katzman, Adam Siepel, Jakob S. Pedersen, Gill Bejerano, Robert Baertsch, Kate R. Rosenbloom, Jim Kent (+1 others)
2006 PLoS Genetics  
We define a human diff as a base where (i) chimp, mouse, and rat have the same nucleotide, (ii) human has a different nucleotide, (iii) the ancestral consensus sequence is not in a CpG dinucleotide, and (iv) the chimp base is high quality (Phred score and in an 11 base window with no indels, no more than 2 human/chimp differences, and all Phred scores ). If the chimp base is not high quality, we refer to the base as a low quality human diff. The number of human diffs in an element can be
more » ... d to the expected number if the element were evolving neutrally in the human lineage: 0.67 diffs per 100bp, based on a genome-wide estimate of P(human≠chimp | chimp=mouse=rat). Table S3 contains the distribution of the number of human diffs in our set of 96% conserved regions. Nearly 24% of the regions have at least one human diff. In addition, they contain between 0 and 6 human indel events (mean=0.38), affecting between 0 and 28 bases (mean=0.52). 30 ≥ 25 ≥ S1.2 Model for nucleotide evolution. The methods we use to detect substitution rate acceleration all make use of a fitted molecular evolutionary model. For this purpose, we use a general reversible singlenucleotide model (REV) 1 with parameters estimated from a genome-wide data set of evolutionarily conserved bases constituting approximately of the human genome. These sites were identified in an independent analysis of a 17 species multiple alignment of human, chimp, macaque, mouse, rat, rabbit, cow, dog, armadillo, elephant, tenrec, opossum, chicken, frog, fugu, tetraodon, and zebrafish using the methods described in Siepel et al. 3% 2 . Using the 17 species alignments and a topology for their phylogenetic tree, we globally estimate the free model parameters (rate matrix, branch lengths) using the phyloFit program 3 . Summaries of the estimated parameters are given in Figure S1 . We call this model the CONS model. The REV model is the most general model for nucleotide substitution subject to the time-reversibility constraint. This model implies a particular parameterization of the rate matrix for nucleotide substitutions, which has four nucleotide frequencies and five rate parameters. Other parameterizations of the substitution rate matrix could be considered. However, we selected the REV model because of its general applicability and do not expect the results to depend heavily on this choice. Note that for simplicity we use a model that assumes independence between bases. Employing a context dependent model would allow us to easily model substitution patterns over two or three adjacent bases, alleviating the need to remove CpG dinucleotides from the data set. However, a context dependent model might present problems with estimation in this setting. It remains to be shown whether employing a more heavily parameterized model will in fact provide more power to detect human-specific changes. Our model for nucleotide evolution could be modified to include indels, allowing us to utilize rather than discard information from alignment columns with gaps. We explored the possibility of treating gaps as a fifth character and identified a number of genomic elements with more human-specific indels than expected (data not shown). One serious drawback to this approach, however, is that insertions and deletions often affect 1 more than one adjacent base, violating the assumption of independence between bases. Consequently, large indels are assigned a higher probability then they deserve. A solution would be to model indel events, rather than indel bases (see ref. 3 for some work on this problem). S1.3 Likelihood Ratio Test. LRT statistics. For each region, we compute the likelihood ratio test (LRT) statistic as follows. First, we fit two models to the multiple alignment data for the region. Both are scaled versions of the CONS model, a technique that avoids re-estimation all model parameters on a small amount of data. The null model has a single scale parameter representing a shortening (more conserved) or lengthening (less conserved) of all branches in the CONS tree. The alternative model has an additional parameter for the human branch, which is constrained to be . This extra parameter allows the human branch to be relatively longer (less conserved) than the branches in the rest of the tree. Both models are fit using the phyloFit function (phast library) with the --init-model and --scale-only options. The model with a human rate parameter is fit with the additional option --scale-subtree human:loss. The LRT statistic is the log ratio of the likelihood of the alternative model to that of the null model. Regions are ranked based on the magnitude of the LRT statistic, with larger values indicating more evidence for acceleration in human. The ranking on LRT statistics agrees well with other methods 1 ≥ . LRT p -values. It is also of interest to assign a measure of statistical significance to each LRT statistic. We compute empirical p -values by simulation from the CONS model. One million simulated data sets are generated using the phyloBoot program (phast library). These are of variable lengths (median=140, as in the observed data). For each simulated data set, the LRT statistic is computed as above. The distributions of these statistics are similar for different length elements, so we pool all simulated LRT statistics to form a single null distribution. For each observed LRT statistic, the empirical p -value is the proportion of simulated data sets with a larger LRT statistic. Note that the smallest p -value that can be estimated by this method ( here) depends on the number of simulated data sets. For observed LRT statistics that exceed all simulated LRT statistics, we can only say . Computational burden prevents more precise estimation. S1.4 Multiple comparisons. Because the genomic regions we study in this paper are on different chromosomes or are separated by significant distance we can assume independence of their nucleotide substitution processes. This assumed independence allows us to employ a simple multiple testing correction throughout this study, the Benjamini & Hochberg False Discovery Rate (FDR) controlling procedure 5 , which requires independence or weak dependence between tests. The smallest FDR adjusted p -value that we can compute in the LRT is . 4 4.5 − e 2 S1.5 Filtering. In order to illustrate the types of erroneous elements that would be found in the list of HARs if we did not perform filtering (Section 4.3), we describe the following elements that were removed from the analysis. One high probability element, hg17.chr13:22,408,812-22,408,911, has a paralog in chimp and human, but not the rodents. This element is eliminated because we could not determine conclusively (due to gaps in the chimp assembly) which chimp sequence should align to each human sequence. There are two relatively significant elements that contain multiple adjacent human changes: hg17.chr10: 127,180,121-127,180,192 and hg17.chr11:118,305,215-118,305,352. In both cases, recomputing the element ranking without the adjacent changes seriously reduces the overall significance of the element (regardless of which specific base is retained). Hence, we eliminate these elements from further study. For two elements, hg17.chrX:95,820,769-95,820,993 and hg17.chrX:95,820,618-95,820,729, both of which fall in an intron of the DIAPH2 (O60879) gene, the chimpanzee reads in NCBI in fact agree with the human sequence. This suggests that the chimp whole genome assembly may be incorrect at this position and the substitutions are primate specific, but not human-specific. Furthermore, the macaque sequence in these two elements agrees with human, supporting the chimp reads and not the chimp assembly. We believe that the chimpanzee sequence in the corresponding Contig #300019 is an assembly error, potentially caused by mouse contamination of the chimpanzee library. One high scoring element, hg17.chr18:74,236,384-74,236,609, is removed based on contradictory findings in our resequencing data. A 4bp deletion in the human genome relative to the chimp and rodent genomes is not found in any of the humans in the PDR panel. Furthermore, all reads in the NCBI trace repository also do not have the deletion. This suggests that it is either a rare mutation or an assembly or read error. Because the apparent human-specific changes in this element are explained by a shift in the alignment due to this questionable indel, we remove the element from the analysis. S1.6 Background substitution rates based on ENCODE data. Background substitution rates are estimated using 4-fold degenerate (4d) sites in the ENCODE regions 6 (http://www.genome.gov/10005107), which cover 1% of the human genome. We fit a REV substitution model 1 to 4d sites from all ENCODE regions and from the five ENCODE regions that fall in the last (distal) band of their chromosomes. The rate matrix and GC content parameters were adjusted to correct for known bias in 4d sites using genome-wide estimates from ancestral repeats. These regions have a similar distribution of distances to the chromosome end as the HAR elements, making their 4d sites a suitable data set to estimate background substitution rates near chromosome ends. For each fitted model, we compute the posterior expected value of the number of substitutions on each lineage with the program phyloP with option --subtree (phast library). The background rates in the human-chimp tree are compared to the estimated chimp and human rates in the HAR elements.
doi:10.1371/journal.pgen.0020168 pmid:17040131 pmcid:PMC1599772 fatcat:zkniycvchjaljctpro6sa3eo5q

Genome sequence of the basal haplorrhine primate Tarsius syrichta reveals unusual insertions

Jürgen Schmitz, Angela Noll, Carsten A. Raabe, Gennady Churakov, Reinhard Voss, Martin Kiefmann, Timofey Rozhdestvensky, Jürgen Brosius, Robert Baertsch, Hiram Clawson, Christian Roos, Aleksey Zimin (+4 others)
2016 Nature Communications  
and et al, ,"Genome sequence of the basal haplorrhine primate Tarsius syrichta reveals unusual insertions. Tarsiers are phylogenetically located between the most basal strepsirrhines and the most derived anthropoid primates. While they share morphological features with both groups, they also possess uncommon primate characteristics, rendering their evolutionary history somewhat obscure. To investigate the molecular basis of such attributes, we present here a new genome assembly of the
more » ... tarsier (Tarsius syrichta), and provide extended analyses of the genome and detailed history of transposable element insertion events. We describe the silencing of Alu monomers on the lineage leading to anthropoids, and recognize an unexpected abundance of long terminal repeat-derived and LINE1-mobilized transposed elements (Tarsius interspersed elements; TINEs). For the first time in mammals, we identify a complete mitochondrial genome insertion within the nuclear genome, then reveal tarsier-specific, positive gene selection and posit population size changes over time. The genomic resources and analyses presented here will aid efforts to more fully understand the ancient characteristics of primate genomes.
doi:10.1038/ncomms12997 pmid:27708261 pmcid:PMC5059674 fatcat:mfojmj2xdnegll5ysfzglznoqi

TumorMap: Exploring the Molecular Similarities of Cancer Samples in an Interactive Portal

Yulia Newton, Adam M. Novak, Teresa Swatloski, Duncan C. McColl, Sahil Chopra, Kiley Graim, Alana S. Weinstein, Robert Baertsch, Sofie R. Salama, Kyle Ellrott, Manu Chopra, Theodore C. Goldstein (+3 others)
2017 Cancer Research  
Baertsch, S.R. Salama, T.C. Goldstein, D. Haussler, J.M. Stuart Development of methodology: Y. Newton, A.M. Novak, T. Swatloski, D.C. McColl, S. Chopra, K. Graim, T.C. Goldstein, D. Haussler, J.M.  ... 
doi:10.1158/0008-5472.can-17-0580 pmid:29092953 pmcid:PMC5751940 fatcat:tini6vfrnvah3ayszbv2dyhyby

A New Low-Cost Instream Antenna System for Tracking Passive Integrated Transponder (PIT)-Tagged Fish in Small Streams

Morgan H. Bond, Chad V. Hanson, Robert Baertsch, Sean A. Hayes, R. Bruce MacFarlane
2007 Transactions of the American Fisheries Society  
We present a new, low-cost, low-power, half/ full-duplex passive integrated transponder (PIT) tag interrogation antenna for use in detecting fish movements in small streams. New technology by Allflex-USA allowed us to develop a reading system with an antenna 279.4 cm wide 3 60.9 cm high that reads both common tag types used in fisheries today for about US$1,000. An instream antenna of this size and price makes high-resolution tracking of fish movement in small streams feasible where cost and
more » ... -type restrictions were prohibitive. For evaluation, we placed the antenna upstream of a small estuary on the central California coast to observe the diel movements of juvenile steelhead Oncorhynchus mykiss between the estuary and upstream habitats in both spring and fall months.
doi:10.1577/t06-084.1 fatcat:sphfkgyazvffln773vhpbvhbjm
« Previous Showing results 1 — 15 out of 118 results