Identification of effective predictive variables for document qualities

Kwong Bor Ng, Rong Tang, Sharon Small, Tomek Strzalkowski, Paul Kantor, Robert Rittman, Peng Song, Ying Sun, Nina Wacholder
2005 Proceedings of the American Society for Information Science and Technology  
We analyzed textual properties of documents to identify predictive variables for various document qualities by means of statistical and linguistic methods. We have created a collection of 1000 documents, each document has been judged in terms of nine document qualities (accuracy, reliability, objectivity, depth, author/producer credibility, readability, verbosity and conciseness, grammatical correctness, one-sided or multiview.) Employing statistical analyses, we considered a kind of linear
more » ... ination, asking (1) if it was possible to combine textual features linearly to predict document qualities; (2) what textual features had good predictive power; (3) what textual features were minimally required for prediction with a detection rate much better than the false alarm rate. We present several promising results, indicating that with a few number of textual features, we can predict various document qualities much better than chance. ASIST 2003
doi:10.1002/meet.1450400128 fatcat:jnbcirak3namzfv3t6d7bysz3e