YouCat: Weakly Supervised Youtube Video Categorization System from Meta Data & User Comments using WordNet & Wikipedia

Subhabrata Mukherjee, Pushpak Bhattacharyya
2012 International Conference on Computational Linguistics  
In this paper, we propose a weakly supervised system, YouCat, for categorizing Youtube videos into different genres like Comedy, Horror, Romance, Sports and Technology The system takes a Youtube video url as input and gives it a belongingness score for each genre. The key aspects of this work can be summarized as: (1) Unlike other genre identification works, which are mostly supervised, this system is mostly unsupervised, requiring no labeled data for training. (2) The system can easily
more » ... ate new genres without requiring labeled data for the genres. ( 3 ) YouCat extracts information from the video title, meta description and user comments (which together form the video descriptor). (4) It uses Wikipedia and WordNet for concept expansion. ( 5 ) The proposed algorithm with a time complexity of O(|W|) (where (|W|) is the number of words in the video descriptor) is efficient to be deployed in web for real-time video categorization. Experimentations have been performed on real world Youtube videos where YouCat achieves an F-score of 80.9%, without using any labeled training set, compared to the supervised, multiclass SVM F-score of 84.36% for single genre prediction. YouCat performs better for multi-genre prediction with an F-Score of 90.48%. Weak supervision in the system arises out of the usage of manually constructed WordNet and genre description by a few root words.
dblp:conf/coling/MukherjeeB12a fatcat:axfz373xbzgtplpn6gowmdousu