A Readability Checker with Supervised Learning Using Deep Indicators

Tim Vor Der Brück, Sven Hartrumpf, Hermann Helbig
2008 Informatica   unpublished
Checking for readability or simplicity of texts is important for many institutional and individual users. Formulas for approximately measuring text readability have a long tradition. Usually, they exploit surface-oriented indicators like sentence length, word length, word frequency, etc. However, in many cases, this information is not adequate to realistically approximate the cognitive difficulties a person can have to understand a text. Therefore we use deep syntactic and semantic indicators
more » ... addition. The syntactic information is represented by a dependency tree, the semantic information by a semantic network. Both representations are automatically generated by a deep syntactico-semantic analysis. A global readability score is determined by applying a nearest neighbor algorithm on 3,000 ratings of 300 test persons. The evaluation showed that the deep syntactic and semantic indicators lead to promising results comparable to the best surface-based indicators. The combination of deep and shallow indicators leads to an improvement over shallow indicators alone. Finally, a graphical user interface was developed which highlights difficult passages, depending on the individual indicator values, and displays a global readability score. Povzetek: Strojno učenje z odvisnostnimi drevesi je uporabljeno za ugotavljanje berljivosti besedil.
fatcat:lnthnwiqgzfwrmvoxjgsbia3ka