Natural Language Processing Using Kepler Workflow System: First Steps

Ankit Goyal, Alok Singh, Shitij Bhargava, Daniel Crawl, Ilkay Altintas, Chun-Nan Hsu
2016 Procedia Computer Science  
Scientific community across many disciplines is exploring new ways to extract knowledge from all available sources. Historically, written manuscripts have been the media of choice for recording experimental findings. Many disciplines such as social science, medical science are exploring ways to automate knowledge discovery from a vast repository of published scientific work. This work attempts to accelerate the process of information extraction by extending Kepler, a graphical workflow
more » ... t tool. Kepler provides a simple way of designing and executing complex workflows in the form of directed graphs. This work presents a scalable approach to convert published research as PDF documents into indexable XML documents using Kepler. This conversion is a critical step in the Natural Language Processing pipeline. Kepler's distributed data processing capability enables scientists to scale this critical computation by simply adding more computing resources over the cloud.
doi:10.1016/j.procs.2016.05.358 fatcat:yuzo6vecajc2bj77hl45mlvdca