Evolving Lucene search queries for text classification

Laurence Hirsch, Robin Hirsch, Masoud Saeedi
2007 Proceedings of the 9th annual conference on Genetic and evolutionary computation - GECCO '07  
We describe a method for generating accurate, compact, human understandable text classifiers. Text datasets are indexed using Apache Lucene and Genetic Programs are used to construct Lucene search queries. Genetic programs acquire fitness by producing queries that are effective binary classifiers for a particular category when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from classification tasks.
doi:10.1145/1276958.1277279 dblp:conf/gecco/HirschHS07 fatcat:7lj3f5i4zvevtgsh2oyrj7nvme