Kodak's consumer video benchmark data set

Alexander Loui, Jiebo Luo, Shih-Fu Chang, Dan Ellis, Wei Jiang, Lyndon Kennedy, Keansub Lee, Akira Yanagawa
2007 Proceedings of the international workshop on Workshop on multimedia information retrieval - MIR '07  
Semantic indexing of images and videos in the consumer domain has become a very important issue for both research and actual application. In this work we developed Kodak's consumer video benchmark data set, which includes (1) a significant number of videos from actual users, (2) a rich lexicon that accommodates consumers' needs, and (3) the annotation of a subset of concepts over the entire video data set. To the best of our knowledge, this is the first systematic work in the consumer domain
more » ... ed at the definition of a large lexicon, construction of a large benchmark data set, and annotation of videos in a rigorous fashion. Such effort will have significant impact by providing a sound foundation for developing and evaluating large-scale learningbased semantic indexing/annotation techniques in the consumer domain. This report includes information about the concept definitions, the annotation process, video collection process, and the data structures used in the release file. The released dataset includes the annotations, extracted visual features (for videos from Kodak), and URLs of videos from YouTube. The Appendix section also includes the full list of concepts (more than 100 concepts in 7 categories) that have been defined in the consumer video domain.
doi:10.1145/1290082.1290117 dblp:conf/mir/LouiLCEJKLY07 fatcat:r7ft2vjpovgtnpstet5osdgm4m