BASIL: Effective Near-Duplicate Image Detection Using Gene Sequence Alignment [chapter]

Hung-sik Kim, Hau-Wen Chang, Jeongkyu Lee, Dongwon Lee
2010 Lecture Notes in Computer Science  
In the dominance of social networks era, vast information is created and shared across the world each day. The uniqueness and the prevalence of these user-generated content present both challenges and opportunities. In this thesis, in particular, we study several tasks on mining the user-generated content with regard to textual content and link-based content. First, we study the home location estimation for Twitter users from their shared textual content. We employ Gaussian Mixture Model to
more » ... ensate the drawback in the Maximum Likelihood Estimation. We propose two unsupervised feature selection methods based on the notions of Non-Localness and Geometric-Localness to prune noisy data in the content. Second, we study the item recommendation problem with a broader view of a social network system. By taking various relationships into consideration, the data sparseness problem common in recommendation tasks are alleviated. Based on the same characteristics principle, we propose a matrix co-factorization framework with a shared latent space to optimize the recommendation collectively. Several algorithms are proposed under the framework to exploit intricate relationships in a social network system. Finally, we investigate the effectiveness of classification with the imperfect textual content extracted from videos, where often very limited information is available. Through means of automatic recognition techniques, some link-based content is enriched with a trade-off of incorrectness. We also propose a heuristics-based method to extract n-gram keyphrases from noisy textual content by taking the importance of sub-term keywords into consideration. iii
doi:10.1007/978-3-642-12275-0_22 fatcat:ou4wo4a6efdabkipzbkaxd5cyi