Interactive Network Exploration in the Kdd Process, Contributions in the Study of Population Variability of a Corn Fijivirus
Journal of Data Mining in Genomics & Proteomics
The genetic variability of individuals of the same species can be studied through networks that represent the genetic distances between them. We studied the case of Mal de Rio Cuarto virus (MRCV), defining distance measures between genome profiles of different individuals and creating a network of haplotypes. Topological properties of the network were analyzed and this was examined in two dimensions, forming space-time environments. The examination led to the observation that, in the first crop
... , in the first crop years tested, the number of haplotypes and the distance between them was greater than in subsequent crops. A variability indicator was calculated for each environment and compared with its expected value, confirming the observation made during the examination and concluding that virus variability decreased after an epidemic occurred during the crop year 1996-97. An analysis of variability of MRCV through haplotype networks is presented. We propose the use of this tool, which is unusual in KDD processes, bringing a new approach that affects the concepts of knowledge representation, structured data modeling, visualization, exploration and interactive discovery. The main contribution of this case to the KDD process is the proposal of interactive exploration of networks, which turned out to be intuitive and easy to apply for analysis.