A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2010; you can also visit <a rel="external noopener" href="http://www.columbia.edu/~kl2074/publications/ChangEtc07-consumer.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
<i title="ACM Press">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/bj3ypeajjfeb7pqvvlh5snru4a" style="color: black;">Proceedings of the international workshop on Workshop on multimedia information retrieval - MIR '07</a>
In this paper we present a systematic study of automatic classification of consumer videos into a large set of diverse semantic concept classes, which have been carefully selected based on user studies and extensively annotated over 1300+ videos from real users. Our goals are to assess the state of the art of multimedia analytics (including both audio and visual analysis) in consumer video classification and to discover new research opportunities. We investigated several statistical approaches<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1290082.1290118">doi:10.1145/1290082.1290118</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/mir/ChangEJLYLL07.html">dblp:conf/mir/ChangEJLYLL07</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/dx5ro37fofgppdlvnxeijygmse">fatcat:dx5ro37fofgppdlvnxeijygmse</a> </span>
more »... uilt upon global/local visual features, audio features, and audio-visual combinations. Three multi-modal fusion frameworks (ensemble, context fusion, and joint boosting) are also evaluated. Experiment results show that visual and audio models perform best for different sets of concepts. Both provide significant contributions to multimodal fusion, via expansion of the classifier pool for context fusion and the feature bases for feature sharing. The fused multimodal models are shown to significantly reduce the detection errors (compared to single modality models), resulting in a promising accuracy of 83% over diverse concepts. To the best of our knowledge, this is the first work on systematic investigation of multimodal classification using a large-scale ontology and realistic video corpus.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20100624073857/http://www.columbia.edu/~kl2074/publications/ChangEtc07-consumer.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/80/fb/80fb56e64e6e97d4fc3540caea6ccc7eb0e9fb42.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1290082.1290118"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>