Engineering Algorithms for Personal Genome Pipelines release_b7gy6nob7naebai5peph3dk7fm

by Manuel Holtgrewe, Universitätsbibliothek Der FU Berlin, Universitätsbibliothek Der FU Berlin

Published by Freie Universität Berlin.

2015  

Abstract

Recent technical advances in high-throughput sequencing technologies and their commercial availability at low costs have paved the way for revolutionary opportunities in the life sciences. One milestone was reaching the $1000 genome, allowing to determine the genetic makeup of hundreds of human individuals within a week for less than $1000 each. This ongoing revolution of the life sciences creates new challenges for the software and algorithms that are processing this data. In my thesis, I consider a typical software pipeline for determining the genome of a human individual. For the preprocessing pipeline step, I describe a method for error correction and consider the comparison of such methods. For the read mapping step, I provide a formal definition of read mapping and I present a software package implementing a benchmark for read mapping, based on my formal definition. I then describe the implementation, parallelisation, and engineering of a fully sensitive read mapper and evaluate its performance. For the variant calling step, I present a method for the prediction of insertion breakpoints and the assembly of large insertions. Of course, such a pipeline is not limited to the processing of human data but it is also applicable to data from other mammals or organisms with smaller and less complex genomes. The presented work is available as an efficient open source C++ implementation, either as parts of the SeqAn library or as programs using SeqAn.
In text/plain format

Archived Files and Locations

application/pdf   2.9 MB
file_skegl6lcxjby3n2bxoh6eikc6q
refubium.fu-berlin.de (publisher)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   published
Date   2015-11-11
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 7be6fc45-836f-444c-b226-4e39d37eccde
API URL: JSON