FEATURE ENGINEERING WITH SENTENCE SIMILARITY USING THE LONGEST COMMON SUBSEQUENCE FOR EMAIL CLASSIFICATION release_74frosp5lzb6ve7yyfqadt5lki

by Aruna Kumara B, Mallikarjun M Kodabagi

Published in Malaysian Journal of Computer Science by Univ. of Malaya.

2022   p65-78

Abstract

Feature selection plays a prominent role in email classification since selecting the most relevant features enhances the accuracy and performance of the learning classifier. Due to the exponential increase rate in the usage of emails, the classification of such emails posed a fitting problem. Therefore, there is a requirement for a proper classification system. Such an email classification system requires an efficient feature selection method for the accurate classification of the most relevant features. This paper proposes a novel feature selection method for sentence similarity using the longest common subsequence for email classification. The proposed feature selection method works in two main phases: First, it builds the longest common subsequence vector of features by comparing each email with all other emails in the dataset. Later, a template is constructed for each class using the closest features of emails of a particular class. Further, email classification is tested for unseen emails using these templates. The performance of the proposed method is compared with traditional feature selection methods such as TF-IDF, Information Gain, Chi-square, and semantic approach. The experimental results showed that the proposed method performed well with 96.61% accuracy.
In application/xml+jats format

Archived Files and Locations

application/pdf   870.4 kB
file_nujbsmbp4zbfzlpmupjux7jfhu
ejournal.um.edu.my (publisher)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article-journal
Stage   published
Date   2022-12-06
Journal Metadata
Not in DOAJ
Not in Keepers Registry
ISSN-L:  0127-9084
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: b89116be-59bb-4555-9032-d289b5d53679
API URL: JSON