Mining citizen science data to predict orevalence of wild bird species

Rich Caruana, Mohamed Elhawary, Art Munson, Mirek Riedewald, Daria Sorokina, Daniel Fink, Wesley M. Hochachka, Steve Kelling
2006 Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '06  
The Cornell Laboratory of Ornithology's mission is to interpret and conserve the earth's biological diversity through research, education, and citizen science focused on birds. Over the years, the Lab has accumulated one of the largest and longest-running collections of environmental data sets in existence. The data sets are not only large, but also have many attributes, contain many missing values, and potentially are very noisy. The ecologists are interested in identifying which features have
more » ... which features have the strongest effect on the distribution and abundance of bird species as well as describing the forms of these relationships. We show how data mining can be successfully applied, enabling the ecologists to discover unanticipated relationships. We compare a variety of methods for measuring attribute importance with respect to the probability of a bird being observed at a feeder and present initial results for the impact of important attributes on bird prevalence.
doi:10.1145/1150402.1150527 dblp:conf/kdd/CaruanaEMRSFHK06 fatcat:uch3bjx6z5by5a7hge7hsjrqv4