Integrated NLP evaluation system for pluggable evaluation metrics with extensive interoperable toolkit

Yoshinobu Kano, Luke McCrohon, Sophia Ananiadou, Jun'ichi Tsujii
2009 Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing - SETQA-NLP '09   unpublished
To understand the key characteristics of NLP tools, evaluation and comparison against different tools is important. And as NLP applications tend to consist of multiple semiindependent sub-components, it is not always enough to just evaluate complete systems, a fine grained evaluation of underlying components is also often worthwhile. Standardization of NLP components and resources is not only significant for reusability, but also in that it allows the comparison of individual components in
more » ... components in terms of reliability and robustness in a wider range of target domains. But as many evaluation metrics exist in even a single domain, any system seeking to aid inter-domain evaluation needs not just predefined metrics, but must also support pluggable user-defined metrics. Such a system would of course need to be based on an open standard to allow a large number of components to be compared, and would ideally include visualization of the differences between components. We have developed a pluggable evaluation system based on the UIMA framework, which provides visualization useful in error analysis. It is a single integrated system which includes a large ready-to-use, fully interoperable library of NLP tools.
doi:10.3115/1621947.1621951 fatcat:aqogzkwjmndbbcgtzunthq46ua