A Novel Statistical Feature Selection Approach for Text Categorization

Mohamed Abdel Fattah
2017 Journal of Information Processing Systems  
For text categorization task, distinctive text features selection is important due to feature space high dimensionality. It is important to decrease the feature space dimension to decrease processing time and increase accuracy. In the current study, for text categorization task, we introduce a novel statistical feature selection approach. This approach measures the term distribution in all collection documents, the term distribution in a certain category and the term distribution in a certain
more » ... ass relative to other classes. The proposed method results show its superiority over the traditional feature selection methods. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. DĂŶƵƐĐƌŝƉƚ ƌĞĐĞŝ|ĞĚ EŽ|ĞŵďĞƌ ϭϭ͕ ϮϬϭϲ͖ ĨŝƌƐƚ ƌĞ|ŝƐŝŽŶ &ĞďƌƵĂƌLJ Ϯϭ͕ ϮϬϭϳ͖ ƐĞĐŽŶĚ ƌĞ|ŝƐŝŽŶ DĂLJ ϮϮ͕ ϮϬϭϳ͖ ĂĐĐĞƉƚĞĚ DĂLJ Ϯϵ͕ ϮϬϭϳ͘ ŽƌƌĞƐƉŽŶĚŝŶŐ ƵƚŚŽƌ͗ DŽŚĂŵĞĚ ďĚĞů &ĂƚƚĂŚ ;ŵŽŚĂĨŝϮϬϬϯΛŚĞůǁĂŶ͘ĞĚƵ͘ĞŐͿ
doi:10.3745/jips.02.0076 fatcat:ky4iji77hzedpkypekyq7v5ixm