Implementazione ed ottimizzazione di algoritmi per l'analisi di Biomedical Big Data

Nico Curti
Big Data Analytics poses many challenges to the research community who has to handle several computational problems related to the vast amount of data. An increasing interest involves Biomedical data, aiming to get the so-called personalized medicine, where therapy plans are designed on the specic genotype and phenotype of an individual patient and algorithm optimization plays a key role to this purpose. In this work we discuss about several topics related to Biomedical Big Data Analytics, with
more » ... ata Analytics, with a special attention to numerical issues and algorithmic solutions related to them. We introduce a novel feature selection algorithm tailored on omics datasets, proving its eciency on synthetic and real high-throughput genomic datasets. The proposed algorithm is a supervised signature identication method based on a bottom-up combinatorial approach that exploits the discriminant power of all variable pairs. We tested our algorithm against other state-of-art methods obtaining better or comparable results. We also implemented and optimized dierent types of deep learning models, testing their eciency on biomedical image processing tasks. Three novel frameworks for deep learning neural network models development are discussed and used to describe the numerical improvements proposed on various topics. In the rst implementation we optimize two Super Resolution models showing their results on NMR images and proving their efciency in generalization tasks without a retraining. The second optimization involves a state-of-art Object Detection neural network architecture, obtaining a signicant speedup in computational performance. We also highlight how Super Resolution models are able to overcome object detection issues and, therefore, increase detection performances. In the third application we discuss about femur head segmentation problem on CT images: a semi-automatic pipeline for the image annotation is proposed and a deep learning neural network model trained on these images. The last section of this work involves the implementation of a novel biomedical database obtained by the harmonization of multiple data sources, that provides network-like relationships between biomedical entities. Data related to diseases, symptoms and other biological relates were mined using web-scraping methods and a novel natural language processing pipeline was designed to maximize the overlap between the dierent data sources involved in this project. We describe the key steps which lead us to this network-of-networks database and we discuss its potential application in biomedical research.
doi:10.6092/unibo/amsdottorato/9371 fatcat:kug2a6abozbababdluixxfmwhe