Analyzing Genomic Data Using Tensor-Based Orthogonal Polynomials [article]

Saba Nafees, Sean Rice, Catherine Wakeman
2020 bioRxiv   pre-print
Due to increasing computational power and experimental sophistication, extensive collection and analysis of genomic data is now possible. This has spurred the search for better algorithms and computational methods to investigate the underlying patterns that connect genotypic and phenotypic data. We propose a multivariate tensor-based orthogonal polynomial approach to characterize nucleotides or amino acids in a given DNA/RNA or protein sequence. Given quantifiable phenotype data that
more » ... ata that corresponds to a biological sequence, we can construct orthogonal polynomials using sequence information and subsequently map phenotypes on to the space of the polynomials. With enough computational power, this approach provides information about higher order interactions between different parts of a sequence in a dataset and ultimately illuminates the relationship between sequence structure and the resulting phenotype. We have applied this method to a previously published case of small transcription activating RNAs (STARs), quantifying higher order relationships between parts of the sequence and how these give rise to the distinct phenotypes.
doi:10.1101/2020.04.24.059279 fatcat:ubpaebl6k5cpdijgnlk5jbld44