SEACOIN2.0: an interactive mining and visualization tool for information retrieval, summarization, and knowledge discovery [article]

Karan Uppal, Eva K. Lee
2017 bioRxiv   pre-print
The rapidly increasing size of biomedical databases such as Medline requires the use of intelligent data mining methods for information extraction and summarization. Existing biomedical text-mining tools have limited capabilities for incorporating citation information during document ranking and for inferring topological and network relationships between biomedical terms. Often too much is returned during summarization leading to information overload. Furthermore, literature based discoveries
more » ... uld be hard to interpret if the network is too complex. SEACOIN2.0 can incorporate citation information during document ranking and uses a unique association rule mining algorithm to generate multi-level k-ary trees. The multi-level trees facilitate efficient information retrieval, visual data exploration, summarization, and hypothesis generation. The system presents graphical summarization via multiple dynamic visualization panels and an interactive word cloud. LexRank algorithm is used to identify salient sentences in top abstracts related to the query. An average F-measure of 94% was achieved for document retrieval, and an average precision of 88% was obtained for identification of top co-occurrence terms. SEACOIN2.0 was also used to replicate previously published findings using the literature-based discovery and EMR-based PheWAS approaches. We present herein SEACOIN2.0, an interactive visual mining tool for improved information retrieval, automated multi-level summarization of Medline abstracts, and literature-based discovery. SEACOIN2.0 addresses the problem of information overload and allows clinicians and biomedical researchers to meet their information needs.
doi:10.1101/206193 fatcat:3inzdscf6nff7g6xtwzmudgcvu