Text-Mining: Application Development Challenges [chapter]

Sundar Varadarajan, Kas Kasravi, Ronen Feldman
2003 Applications and Innovations in Intelligent Systems X  
This paper reviews the best practices and challenges for project managers and developers involved in implementing text-mining applications. With focus on rule-based information extraction, and references to actual cases, the authors share their experiences from developing several text-mining applications in diverse industries. First, project management issues are discussed, including a process for capturing business requirements and mapping them into features and linguistic patterns,
more » ... of linguistic rules, rule development standards, performance metrics, and an evaluation methodology. Linguistic representations such as sub-syntactic, syntactic, semantic, and application-specific rules are identified. Special emphasis is placed on post-information extraction processing, such as improving the relevance of the extracted information, summarization models, techniques for handling typographical errors, resolution of temporal information, resolution of uniqueness of features and events, anaphora resolution, and a discussion on shallow vs. full parsing. Lastly, the paper discusses various utilities to help with the development of a text-mining application, such as feature analysis, visualization, database connectivity, source document pre-processing, and rule authoring tools.
doi:10.1007/978-1-4471-0649-4_17 fatcat:ruq2vvv5xzbxtp5xl3bpuncjr4