KAOS: a new automated computational method for the identification of overexpressed genes
Kinase over-expression and activation as a consequence of gene amplification or gene fusion events is a well-known mechanism of tumorigenesis. The search for novel rearrangements of kinases or other druggable genes may contribute to understanding the biology of cancerogenesis, as well as lead to the identification of new candidate targets for drug discovery. However this requires the ability to query large datasets to identify rare events occurring in very small fractions (1-3 %) of different
... mor subtypes. This task is different from what is normally done by conventional tools that are able to find genes differentially expressed between two experimental conditions. Results: We propose a computational method aimed at the automatic identification of genes which are selectively over-expressed in a very small fraction of samples within a specific tissue. The method does not require a healthy counterpart or a reference sample for the analysis and can be therefore applied also to transcriptional data generated from cell lines. In our implementation the tool can use gene-expression data from microarray experiments, as well as data generated by RNASeq technologies. Conclusions: The method was implemented as a publicly available, user-friendly tool called KAOS (Kinase Automatic Outliers Search). The tool enables the automatic execution of iterative searches for the identification of extreme outliers and for the graphical visualization of the results. Filters can be applied to select the most significant outliers. The performance of the tool was evaluated using a synthetic dataset and compared to state-of-the-art tools. KAOS performs particularly well in detecting genes that are overexpressed in few samples or when an extreme outlier stands out on a high variable expression background. To validate the method on real case studies, we used publicly available tumor cell line microarray data, and we were able to identify genes which are known to be overexpressed in specific samples, as well as novel ones.