78 Hits in 1.9 sec

Ancestry specific association mapping in admixed populations [article]

Line Skotte, Emil Joersboe, Thorfinn Sand S Korneliussen, Ida Moltke, Anders Albrechtsen
2015 bioRxiv   pre-print
During the last decade genome-wide association studies have proven to be a powerful approach to identifying disease-causing variants. However, for admixed populations, most current methods for performing association testing are based on the assumption that the effect of a genetic variant is the same regardless of its ancestry. This is a reasonable assumption for a causal variant, but may not hold for the genetic variants that are tested in genome-wide association studies, which are usually not
more » ... ausal. The effects of non-causal genetic variants depend on how strongly their presence correlate with the presence of the causal variant, which may vary between ancestral populations because of different linkage disequilibrium patterns and allele frequencies. Motivated by this, we here introduce a new statistical method for association testing in recently admixed populations, where the effect size is allowed to depend on the ancestry of a given allele. Our method does not rely on accurate inference of local ancestry, yet using simulations we show that in some scenarios it gives a dramatic increase in statistical power to detect associations. In addition, the method allows for testing for difference in effect size between ancestral populations, which can be used to help determine if a SNP is causal. We demonstrate the usefulness of the method on data from the Greenlandic population.
doi:10.1101/014001 fatcat:w4prwt6cwjbjdeewd7it6ltm5u

ANGSD: Analysis of Next Generation Sequencing Data

Thorfinn Sand Korneliussen, Anders Albrechtsen, Rasmus Nielsen
2014 BMC Bioinformatics  
The first step in a bioinformatic pipeline for analyzing NGS data is usually to align the reads to a reference *Correspondence: 1 Centre for GeoGenetics, Natural History Museum of Denmark  ... 
doi:10.1186/s12859-014-0356-4 pmid:25420514 pmcid:PMC4248462 fatcat:xujrgymmmrg2vhehih35egbc4q

Estimating Individual Admixture Proportions from Next Generation Sequencing Data

Line Skotte, Thorfinn Sand Korneliussen, Anders Albrechtsen
2013 Genetics  
and Supporting Information Estimating Individual Admixture Proportions from Next Generation Sequencing Data Line Skotte, Thorfinn  ...  Korneliussen, and A. Scenario D simulations (variable depth between 0.5X and 6X and varying range of admixture proportions). Based on HGDP frequencies we simulated 340 samples for 100,000 SNP sites.  ... 
doi:10.1534/genetics.113.154138 pmid:24026093 pmcid:PMC3813857 fatcat:zb3dnisemrf2vevxvitfcwhu4q

NgsRelate: a software tool for estimating pairwise relatedness from next-generation sequencing data

Thorfinn Sand Korneliussen, Ida Moltke
2015 Bioinformatics  
¼ X G i l ;G j l 2f0;1;2g 2 PðD i l jG i l ÞPðD j l jG j l ÞPðG i l jf A l ÞPðG j l jf A l ; X l ¼ m; G i l Þ Here PðD i l jG i l Þ and PðD j l jG j l Þ are GLs, which can be estimated using ANGSD (Korneliussen  ... 
doi:10.1093/bioinformatics/btv509 pmid:26323718 pmcid:PMC4673978 fatcat:bu6ombrxczgjrlu4sg273tigmm

The pangenome of the fungal pathogen Neonectria neomacrospora [article]

Knud Nor Nielsen, Kimmo Sirén, Bent Petersen, Thomas Sicheritz-Pontén, M. Thomas P. Gilbert, Thorfinn Sand Korneliussen, Ole Kim Hansen
2021 bioRxiv   pre-print
The fungal plant pathogen Neonectria neomacrospora (C. Booth & Samuels) Mantiri & Samuels (Ascomycota, Hypocreales) is a bark parasite causing twig blight, canker, and in severe cases, dieback in fir (Abies spp.). Although often described as a mild pathogen, foresty and phytosanitary agencies have expressed their concern for potential economic impact. Two epidemics caused by this species are known: one from eastern Canada and one current within Northern Europe. We present key genome features of
more » ... N. neomacrospora, to facilitate the research into the biology of this pathogen. We present the first genome assembly of N. neomacrospora as well as the first pangenome within this genus. The reference genome for N. neomacrospora is a long-read sequenced Danish isolate, while the pangenome is pieced together using additional 60 short-read sequenced strains covering the known geographical distribution of the species, including Europe, North America, and China. The gapless reference genome consist of twelve chromosomes sequenced telomere to telomere to a total length of 37.1 Mb. The mitochondrial genome was assembled and circularised with a length of 22 Kb. The gapless nuclear genome contains a total of 11,291 annotated genes, where 642 only have a hypothetical function, and a 4.3 % repeat content. Two minor chromosomes are enriched in transposable elements, AT content, and effector candidates. Chromosome 12 segregates within the population, indicating an accessory nature. The pangenome compile 15,101 genes, 34% more genes than present in the single isolate reference genome of N. neomacrospora. These genes organise into 13,069 homologous clusters, of which 8,316 clusters are present in all analysed strains, 985 are private to single strains. The British Columbian population branched out before the other populations and are characterized by comparatively larger genomes. The increased genome size can be explained by an expansion of repetitive elements. The comparative analysis finds a higher number of genes with a signal peptide within N. neomacrospora and species within the genus compared to the closely related genera. A species-specific pattern is observed in the carbohydrate-active enzyme repertoire, with a reduced number of polysaccharide lyases, compared to other species within the genus. The CAZymes battery responsible for plant cell wall degradation is similar to that observed in necrotrophic and hemibiotrophic plant pathogenic fungi. The genome size of N. neomacrospora is close to the median size for Ascomycota but is the smallest genome within the Neonectria genus. Comparative analysis revealed significant intraspecies genome size differences between populations explained by a difference in repeat content. Isolates with the smallest genomes formed a monophyletic group consisting of all strains from Europe and Quebec. Based on the field observations, we assume that N. neomacrospora is a hemibiotroph. Our analysis revealed a secretome consistent with a hemibiotrophic lifestyle.
doi:10.1101/2021.03.11.434922 fatcat:gxcmlhwwfbgwjlbqxolbmyxxby

Quantifying Population Genetic Differentiation from Next-Generation Sequencing Data

Matteo Fumagalli, Filipe G. Vieira, Thorfinn Sand Korneliussen, Tyler Linderoth, Emilia Huerta-Sánchez, Anders Albrechtsen, Rasmus Nielsen
2013 Genetics  
Over the last few years, new high-throughput DNA sequencing technologies have dramatically increased speed and reduced sequencing costs. However, the use of these sequencing technologies is often challenged by errors and biases associated with the bioinformatical methods used for analyzing the data. In particular, the use of naïve methods to identify polymorphic sites and infer genotypes can inflate downstream analyses. Recently, explicit modelling of genotype probability distributions has been
more » ... proposed as a method for taking genotype call uncertainty into account. Based on this idea, we propose a novel method for quantifying population genetic differentiation from next-generation sequencing data. In addition, we present a strategy to investigate population structure via Principal Components Analysis. Through extensive simulations, we compare the new method herein proposed to approaches based on genotype calling and demonstrate a marked improvement in estimation accuracy for a wide range of conditions. We apply the method to a large-scale genomic data set of domesticated and wild silkworms sequenced at low coverage. We find that we can infer the fine-scale genetic structure of the sampled individuals, suggesting that employing this new method is useful for investigating the genetic relationships of populations sampled at low coverage.
doi:10.1534/genetics.113.154740 pmid:23979584 pmcid:PMC3813878 fatcat:3wnor3s5b5gtnb2exhaaa6dnnq

Population genomics of the emerging forest pathogen Neonectria neomacrospora [article]

Knud Nor Nielsen, Shyam Gopalakrishnan, Thorfinn Sand Korneliussen, Mikkel Skovrind, Kimmo Sirén, Bent Petersen, Thomas Sicheritz-Pontén, Iben M Thomsen, Ole K Hansen
2020 bioRxiv   pre-print
., 2009) [--dedup] option was used to remove duplicated reads, and angsd v.0.929 266 (Korneliussen, Albrechtsen and Nielsen, 2014) [--doFasta2 -setMinDepth 20] called the most 267 common base for generating  ... 
doi:10.1101/2020.12.07.407155 fatcat:xhmrxdtcijh4lc2zlrkx47ufky

Joint Estimates of Heterozygosity and Runs of Homozygosity for Modern and Ancient Samples

Gabriel Renaud, Kristian Hanghøj, Thorfinn Sand Korneliussen, Eske Willerslev, Ludovic Orlando
2019 Genetics  
Several tools have been released to infer the number of heterozygous sites at equilibrium, also referred to as the heterozygosity, from either raw sequence alignment files (Haubold et al. 2010; Korneliussen  ...  Inferring heterozygosity on the basis of low sequence coverage data are difficult but several methods have been proposed to do so (Bryc et al. 2013; Korneliussen et al. 2014; Kousathanas et al. 2017)  ... 
doi:10.1534/genetics.119.302057 pmid:31088861 pmcid:PMC6614887 fatcat:hgmx32uzjfdd3owz2okfobu7ai

On Detecting Incomplete Soft or Hard Selective Sweeps Using Haplotype Structure

Anna Ferrer-Admetlla, Mason Liang, Thorfinn Korneliussen, Rasmus Nielsen
2014 Molecular biology and evolution  
doi:10.1093/molbev/msu077 pmid:24554778 pmcid:PMC3995338 fatcat:glxc6xos75gjphksnjyujmvgae

SNP Calling, Genotype Calling, and Sample Allele Frequency Estimation from New-Generation Sequencing Data

Rasmus Nielsen, Thorfinn Korneliussen, Anders Albrechtsen, Yingrui Li, Jun Wang, Philip Awadalla
2012 PLoS ONE  
We present a statistical framework for estimation and application of sample allele frequency spectra from New-Generation Sequencing (NGS) data. In this method, we first estimate the allele frequency spectrum using maximum likelihood. In contrast to previous methods, the likelihood function is calculated using a dynamic programming algorithm and numerically optimized using analytical derivatives. We then use a Bayesian method for estimating the sample allele frequency in a single site, and show
more » ... ow the method can be used for genotype calling and SNP calling. We also show how the method can be extended to various other cases including cases with deviations from Hardy-Weinberg equilibrium. We evaluate the statistical properties of the methods using simulations and by application to a real data set.
doi:10.1371/journal.pone.0037558 pmid:22911679 pmcid:PMC3404070 fatcat:mub24jd725c5disdaxk5qcatke

Fast and accurate relatedness estimation from high-throughput sequencing data in the presence of inbreeding

Kristian Hanghøj, Ida Moltke, Philip Alstrup Andersen, Andrea Manica, Thorfinn Sand Korneliussen
2019 GigaScience  
In the absence of inbreeding our model reduces to the work in Korneliussen and Moltke [11] .  ...  The 2D-SFS obtained in ngsRelateV2 follows the methodology from Korneliussen et al. [18] that is based on genotype likelihoods and therefore does not require called genotypes.  ...  The optimization follows the approach described in Korneliussen and Moltke [11] .  ... 
doi:10.1093/gigascience/giz034 pmid:31042285 pmcid:PMC6488770 fatcat:odw6hrzswfdlxoc7sumqkanwoy

Fast and accurate estimation of multidimensional site frequency spectra from low-coverage high-throughput sequencing data

Alex Mas-Sandoval, Nathaniel S Pope, Knud Nor Nielsen, Isin Altinkaya, Matteo Fumagalli, Thorfinn Sand Korneliussen
2022 GigaScience  
[5] and implemented by Korneliussen et al. [21] computes the entire vector efficiently in O(n 2 ).  ... 
doi:10.1093/gigascience/giac032 pmid:35579549 pmcid:PMC9112775 fatcat:6mwmsmosyngjre33ysa54q63qa

Calculation of Tajima's D and other neutrality test statistics from low depth next-generation sequencing data

Thorfinn Sand Korneliussen, Ida Moltke, Anders Albrechtsen, Rasmus Nielsen
2013 BMC Bioinformatics  
A number of different statistics are used for detecting natural selection using DNA sequencing data, including statistics that are summaries of the frequency spectrum, such as Tajima's D. These statistics are now often being applied in the analysis of Next Generation Sequencing (NGS) data. However, estimates of frequency spectra from NGS data are strongly affected by low sequencing coverage; the inherent technology dependent variation in sequencing depth causes systematic differences in the
more » ... e of the statistic among genomic regions. Results: We have developed an approach that accommodates the uncertainty of the data when calculating site frequency based neutrality test statistics. A salient feature of this approach is that it implicitly solves the problems of varying sequencing depth, missing data and avoids the need to infer variable sites for the analysis and thereby avoids ascertainment problems introduced by a SNP discovery process. Conclusion: Using an empirical Bayes approach for fast computations, we show that this method produces results for low-coverage NGS data comparable to those achieved when the genotypes are known without uncertainty. We also validate the method in an analysis of data from the 1000 genomes project. The method is implemented in a fast framework which enables researchers to perform these neutrality tests on a genome-wide scale.
doi:10.1186/1471-2105-14-289 pmid:24088262 pmcid:PMC4015034 fatcat:oo6nkdzrjfb27dkklewntkivne

Estimation of site frequency spectra from low-coverage sequencing data using stochastic EM reduces overfitting, runtime, and memory usage [article]

Malthe Sebro Rasmussen, Genís Garcia-Erill, Thorfinn Sand Korneliussen, Carsten Wiuf, Anders Albrechtsen
2022 bioRxiv   pre-print
The site frequency spectrum (SFS) is an important summary statistic in population genetics used for inference on demographic history and selection. However, estimation of the SFS from called genotypes introduce bias when working with low-coverage sequencing data. Methods exist for addressing this issue, but suffer two problems. First, they have very high computational demands, to the point that it may not be possible to run estimation for genome-scale data. Second, existing methods are prone to
more » ... overfitting, especially for multi-dimensional SFS estimation. In this article, we present a stochastic expectation-maximisation algorithm for inferring the SFS from NGS data that addresses these challenges. We show that this algorithm greatly reduces runtime and enables estimation with constant, trivial RAM usage. Further, the algorithm reduces overfitting and thereby improves downstream inference. An implementation is available at
doi:10.1101/2022.05.24.493190 fatcat:zkjnus7brnetrlmxl45khohk2y

Identifying a living great-grandson of the Lakota Sioux leader Tatanka Iyotake (Sitting Bull)

Ida Moltke, Thorfinn Sand Korneliussen, Andaine Seguin-Orlando, J. Víctor Moreno-Mayar, Ernie LaPointe, William Billeck, Eske Willerslev
2021 Science Advances  
[Figure: see text].
doi:10.1126/sciadv.abh2013 pmid:34705496 pmcid:PMC8550246 fatcat:uteo3am3tfdc5hxwzj46lpvn34
« Previous Showing results 1 — 15 out of 78 results