A K-Means Based Multi-level Text Clustering Algorithm for Retrieval of Research Information

Damaris Ndinda Waema, Petronilla Muriithi, George Okeyo
2019 International Journal of Computer Applications Technology and Research  
Academic researchers in institutions of higher learning and research institutes use research outputs and metadata throughout their research work and to help in identifying research collaborators as well as getting to know existing research. Research outputs range from academic theses, journal and conference articles, books and book chapters, and datasets while research meta-data includes authors, affiliations, research areas, and projects, among others. However, access and retrieval of relevant
more » ... research outputs and metadata remains a major challenge. As a result there is duplication of research, fewer opportunities for networking, and difficulty in detecting scientific fraud. Efforts need to be made to make academic research outputs and meta-data readily available and easy to retrieve. The main purpose of this work is to develop a tailor-made approach to information retrieval for the retrieval of research information and related meta-data. Therefore, the paper presents a multi-level text clustering algorithm for retrieval of scholarly research outputs and metadata from a central repository through a web based interface. The algorithm first clusters SQL data records that represents meta-data at the first level, then retrieves and clusters text documents representing research outputs at the second level. The algorithm was tested on retrieving information in the areas of text clustering, cloud computing, banking, HIV/AIDS, food security and cancer. The results show that it enables researchers to retrieve relevant information according to their information needs. To enable further enhancements and improvements, the algorithm will be released to the public domain for use in similar application domains or extension by other researchers.
doi:10.7753/ijcatr0803.1003 fatcat:na4773vylbc7zoeztvgzm4cpei