Mining Interesting Clinico-Genomic Associations: The HealthObs Approach
[chapter]
George Potamias, Lefteris Koumakis, Alexandros Kanterakis, Vassilis Moustakis, Dimitrsi Kafetzopoulos, Manolis Tsiknakis
IFIP The International Federation for Information Processing
HealthObs is an integrated (Java-based) environment targeting the seamless integration and intelligent processing of distributed and heterogeneous clinical and genomic data. Via the appropriate customization of standard medical and genomic data-models HealthObs achieves the semantic homogenization of remote clinical and gene-expression records, and their uniform XML-based representation. The system utilizes data-mining techniques (association rules mining) that operate on top of query-specific
more »
... ML documents. Application of HealthObs on a real world breast-cancer clinico-genomic study demonstrates the utility and efficiency of the approach. 138 George Potamias et al. The vision is to compact major diseases, such as cancer, on an individualized diagnostic, prognostic and treatment manner. This requires not only an understanding of the genetic basis of the disease -acquired, for example, from patient's gene expression profiling studies [4, 13, 21] , but also the correlation of this data with knowledge normally processed in the clinical setting. Coupling the knowledge gained from genomics and from clinical practice is of crucial importance and presents a major challenge for on-going and future clinico-genomic trials [15] . Such evidential knowledge will enhance health care professionals' decision-making capabilities, in an attempt to meet the raising evidence-based medicine demand. Recently, and in the context of three research projects -PrognoChio (http:// www.ics.forth.gr/~analyti/PrognoChip/isl_site/index.html, [6]), INFOBIOMED (www.infobiomed.net, [10]), and ACGT (http://www.eu-acgt.org, [15]), we have designed and implemented an integated clinico-genomics environment [7]. The environment is enhanced by a Mediation infrastructure through which linkage and integration of patients' clinical and genomic (e.g., nicroarray gene-expression) data is achieved [2]. The clinical information systems being utilized are components of an integrated clinical systems' infrastructure built in the region of Crete, Greece [16] . These systems are: (a) Onco-Surgery information system -manages information related to patient identification and demographic information, medical history, patient risk factors, family history of malignancy, clinical examinations and findings, results of laboratory exams (mammography, ultrasound, hematological etc.), presurgical and post-surgical therapies, as well as therapy effectiveness and follow-up; and (b) Histo-Pathology information system -manages information related to patients samples' histopathologic evaluation and TNM staging (tumor size, lymph node involvement, and metastatic spread). Engaged CISs comply with relevant medical information and data models, such as: SNOMED CT® (http://www.snomed.org/), ICD (http://www.cdc.gov/nchs/ icd9.htm), and LOINC® (http://www.regenstrief.org/loinc/). Data and information exchange between the two CIS is based on the HL7 (Health Level 7) messaging standard (http://www.hl7.org). The experimental study presented in this paper (section 4) deploys the two CIS to store and manage patients' clinico-histopathology information and data drawn from an anonymized public domain clinico-genomic study [13] . In this respect we are not confronted with ethical, legal and security issues (even if the whole infrastructure provides high-level security services). With the help of the Mediator, the biomedical investigator can form clinicogenomic queries through the web-based graphical user interface of the Mediator and translates them into an equivalent set of local sub-queries, which are executed directly against the constituent databases (i.e., clinical and genomic/microarray information systems). Then, results are combined for presentation to the user and/or transmission to further analysis. Access to distributed and heterogeneous data sources and collection of respective data items are not end in itself. What is desirable is the exploitation of data, hence the possibility for exporting useful and comprehensible conclusions. In this context we have designed and developed an integrated clinico-genomic knowledge discovery scenario enabled by a multi-strategy data-mining approach. The scenario is realized by the smooth integration of three data-mining techniques: clustering, association rules mining and feature selection [3, 14] . In this scenario, clustering is performed on
doi:10.1007/978-0-387-74161-1_15
dblp:conf/ifip12/PotamiasKKMKT07
fatcat:fbkrkwhbrbhlpcjcacranmietu