TEXT MINING TRANSPORTATION RESEARCH GRANT BIG DATA: KNOWLEDGE EXTRACTION AND PREDICTIVE MODELING USING FAST NEURAL NETS
International Journal for transport and traffic engineering
Research grant databases offer a wealth of information to study research trends, research collaboration networks and patterns of funding over time. Natural Language Processing (NLP) and Text Mining (TM) in combination with Machine Learning (ML) are excellent data science tools to collect, analyze and to unearth interesting findings from huge text corpora such as these databases. At a time, when transportation agencies across the globe are facing budgetary constraints and are asked "to do more
... th less", extracting information from such databases to build predictive models for aiding or providing guidance to researchers and agencies has become very important. At the same time, understanding past patterns of funding and interest in various subject areas is also useful for PhD researchers planning their research formulation and for academic researchers seeking funding in general. We present a comprehensive study of the Transportation Research Board's (TRB's) Research in Progress (RIP) "big data" that contains information on more than 14,000 current or recently completed projects funded in the past 25 years, mainly by U.S. Department of Transportation (DOT) and State DOTs. We perform longitudinal studies to discover various interesting patterns and anomalies in the data using text mining pipelines. Finally, we develop a predictive model to leverage text mined information for predicting the most appropriate funding agency to target for a researcher working across various research areas.