Filters








35 Hits in 0.61 sec

AKT: Ancestry and Kinship Toolkit [article]

Rudy Arthur, Ole Schulz-Trieglaff, Anthony J Cox, Jared Michael O'Connell
2016 bioRxiv   pre-print
Ancestry and Kinship Toolkit(AKT) is a statistical genetics tool for analysing large cohorts of whole-genome sequenced samples. It can rapidly detect related samples, characterise sample ancestry, calculate correlation between variants, check Mendel consistency and perform data clustering. AKT brings together the functionality of many state-of-the-art methods, with a focus on speed and a unified interface. We believe it will be an invaluable tool for the curation of large WGS data-sets.
doi:10.1101/047829 fatcat:nr32ggmksnbalhuxzwjda3jvny

Stochastic Petri Nets in Systems Biology

Ole Schulz-Trieglaff
2005 BMC Bioinformatics  
doi:10.1186/1471-2105-6-s3-p25 fatcat:hff5qbhqybfxnkm2ojkywoogri

AKT: ancestry and kinship toolkit

Rudy Arthur, Ole Schulz-Trieglaff, Anthony J. Cox, Jared O'Connell
2016 Bioinformatics  
Motivation: Ancestry and Kinship Toolkit (AKT) is a statistical genetics tool for analysing large cohorts of whole-genome sequenced samples. It can rapidly detect related samples, characterise sample ancestry, calculate correlation between variants, check Mendel consistency and perform data clustering. AKT brings together the functionality of many state-of-the-art methods, with a focus on speed and a unified interface. We believe it will be an invaluable tool for the curation of large WGS
more » ... ets. Availability: The source code is available at https://illumina.
doi:10.1093/bioinformatics/btw576 pmid:27634946 fatcat:glajxbvc6nezroyxkwadkzztty

BEETL-fastq: a searchable compressed archive for DNA reads [article]

Lilian Janin and Ole Schulz-Trieglaff and Anthony J. Cox
2014 arXiv   pre-print
Motivation: FASTQ is a standard file format for DNA sequencing data which stores both nucleotides and quality scores. A typical sequencing study can easily generate hundreds of gigabytes of FASTQ files, while public archives such as ENA and NCBI and large international collaborations such as the Cancer Genome Atlas can accumulate many terabytes of data in this format. Text compression tools such as gzip are often employed to reduce the storage burden, but have the disadvantage that the data
more » ... be decompressed before it can be used. Here we present BEETL-fastq, a tool that not only compresses FASTQ-formatted DNA reads more compactly than gzip, but also permits rapid search for k-mer queries within the archived sequences. Importantly, the full FASTQ record of each matching read or read pair is returned, allowing the search results to be piped directly to any of the many standard tools that accept FASTQ data as input. Results: We show that 6.6 terabytes of human reads in FASTQ format can be transformed into 1.7 terabytes of indexed files, from where we can search for 1, 10, 100, 1000, a million of 30-mers in respectively 3, 8, 14, 45 and 567 seconds plus 20 ms per output read. Useful applications of the search capability are highlighted, including the genotyping of structural variant breakpoints and "in silico pull-down" experiments in which only the reads that cover a region of interest are selectively extracted for the purposes of variant calling or visualization. Availability: BEETL-fastq is part of the BEETL library, available as a github repository at git@github.com:BEETL/BEETL.git.
arXiv:1406.4376v1 fatcat:k2n33cm34be4jpfknal5ca7eum

NxTrim: optimized trimming of Illumina mate pair reads [article]

Jared O'Connell, Ole Schulz-Trieglaff, Emma Carlson, Matthew M Hims, Niall A Gormley, Anthony J Cox
2014 bioRxiv   pre-print
Motivation:Mate pair protocols add to the utility of paired-end sequencing by boosting the genomic distance spanned by each pair of reads, potentially allowing larger repeats to be bridged and resolved. The Illumina Nextera Mate Pair (NMP) protocol employs a circularisation-based strategy that leaves behind 38bp adapter sequences which must be computationally removed from the data. While "adapter trimming" is a well-studied area of bioinformatics, existing tools do not fully exploit the
more » ... ar properties of NMP data and discard more data than is necessary.Results:We present NxTrim, a tool that strives to discard as little sequence as possible from NMP reads. The sequence either side of the adapter site is triaged into "virtual libraries" of mate pairs, paired-end reads and single-ended reads. When combined, these data boost coverage and can substantially improve the de novo assembly of bacterial genomes.
doi:10.1101/007666 fatcat:clvwz3di3bhcrb3cqemrtp6x2y

NxRepair: error correction inde novosequence assembly using Nextera mate pairs

Rebecca R. Murphy, Jared O'Connell, Anthony J. Cox, Ole Schulz-Trieglaff
2015 PeerJ  
Cox and Ole Schulz-Trieglaff are permanent employees of Illumina Inc., a public company that develops and markets systems for genomic analysis. They receive shares as part of their compensation.  ...  Murphy, Jared O'Connell and Ole Schulz-Trieglaff conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, prepared  ... 
doi:10.7717/peerj.996 pmid:26056623 pmcid:PMC4458127 fatcat:hmtzadvoifgwppno2idsjx6egy

Rapid Genotype Refinement for Whole-Genome Sequencing Data using Multi-Variate Normal Distributions [article]

Rudy Arthur, Jared O'Connell, Ole Schulz-Trieglaff, Anthony J Cox
2015 bioRxiv   pre-print
Whole-genome low-coverage sequencing has been combined with linkage-disequilibrium (LD) based genotype refinement to accurately and cost-effectively infer genotypes in large cohorts of individuals. Most genotype refinement methods are based on hidden Markov models, which are accurate but computationally expensive. We introduce an algorithm that models LD using a simple multivariate Gaussian distribution. The key feature of our algorithm is its speed, it is hundreds of times faster than other
more » ... hods on the same data set and its scaling behaviour is linear in the number of samples. We demonstrate the performance of the method on both low-coverage and high-coverage samples.
doi:10.1101/031484 fatcat:cijympiozvc6nbzuh3rvmwfody

LC-MSsim - a simulation software for Liquid Chromatography Mass Spectrometry data

Ole Schulz-Trieglaff, Nico Pfeifer, Clemens Gropl, Oliver Kohlbacher, Knut Reinert
2008 BMC Bioinformatics  
Mass Spectrometry coupled to Liquid Chromatography (LC-MS) is commonly used to analyze the protein content of biological samples in large scale studies. The data resulting from an LC-MS experiment is huge, highly complex and noisy. Accordingly, it has sparked new developments in Bioinformatics, especially in the fields of algorithm development, statistics and software engineering. In a quantitative label-free mass spectrometry experiment, crucial steps are the detection of peptide features in
more » ... e mass spectra and the alignment of samples by correcting for shifts in retention time. At the moment, it is difficult to compare the plethora of algorithms for these tasks. So far, curated benchmark data exists only for peptide identification algorithms but no data that represents a ground truth for the evaluation of feature detection, alignment and filtering algorithms. Results: We present LC-MSsim, a simulation software for LC-ESI-MS experiments. It simulates ESI spectra on the MS level. It reads a list of proteins from a FASTA file and digests the protein mixture using a user-defined enzyme. The software creates an LC-MS data set using a predictor for the retention time of the peptides and a model for peak shapes and elution profiles of the mass spectral peaks. Our software also offers the possibility to add contaminants, to change the background noise level and includes a model for the detectability of peptides in mass spectra. After the simulation, LC-MSsim writes the simulated data to mzData, a public XML format. The software also stores the positions (monoisotopic m/z and retention time) and ion counts of the simulated ions in separate files. Conclusion: LC-MSsim generates simulated LC-MS data sets and incorporates models for peak shapes and contaminations. Algorithm developers can match the results of feature detection and alignment algorithms against the simulated ion lists and meaningful error rates can be computed. We anticipate that LC-MSsim will be useful to the wider community to perform benchmark studies and comparisons between computational tools.
doi:10.1186/1471-2105-9-423 pmid:18842122 pmcid:PMC2577660 fatcat:jqubcnbko5fczchifarnx3ccm4

Statistical quality assessment and outlier detection for liquid chromatography-mass spectrometry experiments

Ole Schulz-Trieglaff, Egidijus Machtejevas, Knut Reinert, Hartmut Schlüter, Joachim Thiemann, Klaus Unger
2009 BioData Mining  
Quality assessment methods, that are common place in engineering and industrial production, are not widely spread in large-scale proteomics experiments. But modern technologies such as Multi-Dimensional Liquid Chromatography coupled to Mass Spectrometry (LC-MS) produce large quantities of proteomic data. These data are prone to measurement errors and reproducibility problems such that an automatic quality assessment and control become increasingly important. Results: We propose a methodology to
more » ... assess the quality and reproducibility of data generated in quantitative LC-MS experiments. We introduce quality descriptors that capture different aspects of the quality and reproducibility of LC-MS data sets. Our method is based on the Mahalanobis distance and a robust Principal Component Analysis. Conclusion: We evaluate our approach on several data sets of different complexities and show that we are able to precisely detect LC-MS runs of poor signal quality in large-scale studies.
doi:10.1186/1756-0381-2-4 pmid:19351414 pmcid:PMC2678124 fatcat:2i6eozvkznc2xcul2ve3cz6xgy

NxTrim: optimized trimming of Illumina mate pair reads: Table 1

Jared O'Connell, Ole Schulz-Trieglaff, Emma Carlson, Matthew M. Hims, Niall A. Gormley, Anthony J. Cox
2015 Bioinformatics  
Motivation: Mate pair protocols add to the utility of paired-end sequencing by boosting the genomic distance spanned by each pair of reads, potentially allowing larger repeats to be bridged and resolved. The Illumina Nextera Mate Pair (NMP) protocol uses a circularization-based strategy that leaves behind 38-bp adapter sequences, which must be computationally removed from the data. While 'adapter trimming' is a well-studied area of bioinformatics, existing tools do not fully exploit the
more » ... ar properties of NMP data and discard more data than is necessary. Results: We present NxTrim, a tool that strives to discard as little sequence as possible from NMP reads. NxTrim makes full use of the sequence on both sides of the adapter site to build 'virtual libraries' of mate pairs, paired-end reads and single-ended reads. For bacterial data, we show that aggregating these datasets allows a single NMP library to yield an assembly whose quality compares favourably to that obtained from regular paired-end reads. Availability and implementation: The source code is available at https://github.com/sequencing/ NxTrim
doi:10.1093/bioinformatics/btv057 pmid:25661542 fatcat:y3pfkah4ujh27k2egwaubfgoji

metaBEETL: high-throughput analysis of heterogeneous microbial populations from shotgun DNA sequences

Christina Ander, Ole B Schulz-Trieglaff, Jens Stoye, Anthony J Cox
2013 BMC Bioinformatics  
Environmental shotgun sequencing (ESS) has potential to give greater insight into microbial communities than targeted sequencing of 16S regions, but requires much higher sequence coverage. The advent of next-generation sequencing has made it feasible for the Human Microbiome Project and other initiatives to generate ESS data on a large scale, but computationally efficient methods for analysing such data sets are needed. Here we present metaBEETL, a fast taxonomic classifier for environmental
more » ... tgun sequences. It uses a Burrows-Wheeler Transform (BWT) index of the sequencing reads and an indexed database of microbial reference sequences. Unlike other BWT-based tools, our method has no upper limit on the number or the total size of the reference sequences in its database. By capturing sequence relationships between strains, our reference index also allows us to classify reads which are not unique to an individual strain but are nevertheless specific to some higher phylogenetic order. Tested on datasets with known taxonomic composition, metaBEETL gave results that are competitive with existing similarity-based tools: due to normalization steps which other classifiers lack, the taxonomic profile computed by metaBEETL closely matched the true environmental profile. At the same time, its moderate running time and low memory footprint allow metaBEETL to scale well to large data sets. Code to construct the BWT indexed database and for the taxonomic classification is part of the BEETL library, available as a github repository at git@github.com:BEETL/BEETL.git.
doi:10.1186/1471-2105-14-s5-s2 pmid:23734710 pmcid:PMC3622627 fatcat:bakccnlzlzglpkosw6c6x3whgq

Manta: Rapid detection of structural variants and indels for clinical sequencing applications [article]

Xiaoyu Chen, Ole Schulz-Trieglaff, Richard Shaw, Bret Barnes, Felix Schlesinger, Anthony J. Cox, Semyon Kruglyak, Christopher T. Saunders
2015 bioRxiv   pre-print
Summary: We describe Manta, a method to discover structural variants and indels from next generation sequencing data. Manta is optimized for rapid clinical analysis, calling structural variants, medium-sized indels and large insertions on standard compute hardware in less than a tenth of the time that comparable methods require to identify only subsets of these variant types: for example NA12878 at 50x genomic coverage is analyzed in less than 20 minutes. Manta can discover and score variants
more » ... sed on supporting paired and split-read evidence, with scoring models optimized for germline analysis of diploid individuals and somatic analysis of tumor-normal sample pairs. Call quality is similar to or better than comparable methods, as determined by pedigree consistency of germline calls and comparison of somatic calls to COSMIC database variants. Manta consistently assembles a higher fraction of its calls to basepair resolution, allowing for improved downstream annotation and analysis of clinical significance. We provide Manta as a community resource to facilitate practical and routine structural variant analysis in clinical and research sequencing scenarios. Availability: Manta source code and Linux binaries are available from http://github.com/Illumina/manta. Contact: csaunders@illumina.com
doi:10.1101/024232 fatcat:mphs2xtgrbdffdekoe7lcsiily

Computational Quantification of Peptides from LC-MS Data

Ole Schulz-Trieglaff, Rene Hussong, Clemens Gröpl, Andreas Leinenbach, Andreas Hildebrandt, Christian Huber, Knut Reinert
2008 Journal of Computational Biology  
Ole Schulz-Trieglaff is supported by the Max Planck Research School for Computational Biology and Scientific Computing in Berlin.  ...  In Schulz-Trieglaff et al. (2007) , we have shown that we can accurately and quickly quantify even low abundance peptides using a mother wavelet that mimics the distribution of isotopic peak intensities  ... 
doi:10.1089/cmb.2007.0117 pmid:18707556 fatcat:vmu57z3b3vc7xgicgfdr6ezrxu

OpenMS – An open-source software framework for mass spectrometry

Marc Sturm, Andreas Bertsch, Clemens Gröpl, Andreas Hildebrandt, Rene Hussong, Eva Lange, Nico Pfeifer, Ole Schulz-Trieglaff, Alexandra Zerck, Knut Reinert, Oliver Kohlbacher
2008 BMC Bioinformatics  
Mass spectrometry is an essential analytical technique for high-throughput analysis in proteomics and metabolomics. The development of new separation techniques, precise mass analyzers and experimental protocols is a very active field of research. This leads to more complex experimental setups yielding ever increasing amounts of data. Consequently, analysis of the data is currently often the bottleneck for experimental studies. Although software tools for many data analysis tasks are available
more » ... oday, they are often hard to combine with each other or not flexible enough to allow for rapid prototyping of a new analysis workflow. Results: We present OpenMS, a software framework for rapid application development in mass spectrometry. OpenMS has been designed to be portable, easy-to-use and robust while offering a rich functionality ranging from basic data structures to sophisticated algorithms for data analysis. This has already been demonstrated in several studies.
doi:10.1186/1471-2105-9-163 pmid:18366760 pmcid:PMC2311306 fatcat:ydzwiuwdhfff5k2xj4iaglslde

A geometric approach for the alignment of liquid chromatography—mass spectrometry data

Eva Lange, Clemens Gröpl, Ole Schulz-Trieglaff, Andreas Leinenbach, Christian Huber, Knut Reinert
2007 Computer applications in the biosciences : CABIOS  
Several algorithms exist for this task, such as Li et al. (2003) , Schulz-Trieglaff et al. (2007) and Wang et al. (2003) .  ...  The charge of each feature is determined by fitting a theoretical isotope model based on the average composition of a peptide for a given mass as proposed earlier (Schulz-Trieglaff et al., 2007) .  ... 
doi:10.1093/bioinformatics/btm209 pmid:17646306 fatcat:5hcddxjpbzafbf77fhpeu7r6n4
« Previous Showing results 1 — 15 out of 35 results