Filters








18 Hits in 0.19 sec

TITGT at TRECVID 2009 Workshop

Nakamasa Inoue, Shanshan Hao, Tatsuhiko Saito, Koichi Shinoda, Ilseo Kim, Chin-Hui Lee
2009 TREC Video Retrieval Evaluation  
We propose a statistical framework for high-level feature (HLF) extraction, which employs scale-invariant feature transform Gaussian mixture models (SIFT GMMs), acoustic features, and maximal figure-of-merit (MFoM). The MeanInfAP of our best run was 0.1679. Our team placed 11th after all of the runs and 4th among all participating teams. Notably, the InfAPs of "Singing" and "People-dancing" were 0.229 and 0.319, respectively, which were the top scores in all of the runs.
dblp:conf/trecvid/InoueHSSKL09 fatcat:nc3q4nusnjgmbllq4vpia3ckde

TT+GT at TRECVID 2010 Workshop

Nakamasa Inoue, Toshiya Wada, Yusuke Kamishima, Koichi Shinoda, Ilseo Kim, Byungki Byun, Chin-Hui Lee
2010 TREC Video Retrieval Evaluation  
In this paper, we present our systems for semantic indexing and surveillance event detection in 2 Semantic Indexing Tokyo Tech's System This section describes the GMM supervector kernels with MFCC and SIFT features. Feature extraction We extract three types of visual and audio features as follows: SIFT features with Harris affine detector The SIFT feature proposed by Lowe [1] is invariant to image scaling and changing illumination so that it is widely used for object detection and
more » ... . Moreover, the Harris affine region detector [2] , which is an extension of the Harris corner detector, provides affine-invariant regions. We use 32 dimensional SIFT features whose dimension is reduced by applying principal component analysis (PCA). The SIFT features are extracted not only from keyframes but also from a half of all the image frames in a shot.
dblp:conf/trecvid/InoueWKSKBL10 fatcat:wcfbktyz5fb6xo5kz5kjwo25ku

Segmental multi-way local pooling for video recognition

Ilseo Kim, Sangmin Oh, Arash Vahdat, Kevin Cannons, A.G. Amitha Perera, Greg Mori
2013 Proceedings of the 21st ACM international conference on Multimedia - MM '13  
In this work, we address the problem of complex event detection on unconstrained videos. We introduce a novel multiway feature pooling approach which leverages segment-level information. The approach is simple and widely applicable to diverse audio-visual features. Our approach uses a set of clusters discovered via unsupervised clustering of segmentlevel features. Depending on feature characteristics, not only scene-based clusters but also motion/audio-based clusters can be incorporated. Then,
more » ... very video is represented with multiple descriptors, where each descriptor is designed to relate to one of the pre-built clusters. For classification, intersection kernel SVMs are used where the kernel is obtained by combining multiple kernels computed from corresponding per-cluster descriptor pairs. Evaluation on TRECVID '11 MED dataset shows a significant improvement by the proposed approach beyond the state-of-the-art.
doi:10.1145/2502081.2502167 dblp:conf/mm/KimOVCPM13 fatcat:tz3uc73ttffnblkstouulzzimy

Explicit Performance Metric Optimization for Fusion-Based Video Retrieval [chapter]

Ilseo Kim, Sangmin Oh, Byungki Byun, A. G. Amitha Perera, Chin-Hui Lee
2012 Lecture Notes in Computer Science  
We present a learning framework for fusion-based video retrieval system, which explicitly optimizes given performance metrics. Real-world computer vision systems serve sophisticated user needs, and domain-specific performance metrics are used to monitor the success of such systems. However, the conventional approach for learning under such circumstances is to blindly minimize standard error rates and hope the targeted performance metrics improve, which is clearly suboptimal. In this work, a
more » ... l scheme to directly optimize such targeted performance metrics during learning is developed and presented. Our experimental results on two large consumer video archives are promising and showcase the benefits of the proposed approach.
doi:10.1007/978-3-642-33885-4_40 fatcat:clqjkkhvfnhhzfzof2matngcoq

Per-Exemplar Fusion Learning for Video Retrieval and Recounting

Ilseo Kim, Sangmin Oh, A.G. Amitha Perera, Chin-Hui Lee
2012 2012 IEEE International Conference on Multimedia and Expo  
We propose a novel video retrieval framework based on an extension of per-exemplar learning [7] . Each training sample with multiple types of features (e.g., audio and visual) is regarded as an exemplar. For each exemplar, a localized perexemplar distance function is learned and used to measure the similarity between itself and new test samples. Exemplars associate only with sufficiently similar test data, which accumulate to identify the data to be retrieved. In particular, for every exemplar,
more » ... relevance of each feature type is discriminatively analyzed and the effect of less informative features is minimized during the fusion-based associations. In addition, we show that our framework can enable a rich set of recounting capabilities where the rationale for each retrieval result can be automatically described to users to aid their interaction with the system. We show that our system provides competitive retrieval accuracy against strong baseline methods, while adding the benefits of recounting.
doi:10.1109/icme.2012.150 dblp:conf/icmcs/KimOPL12 fatcat:inctd6tcqnbxbk5vfx3dyda2xy

A detection-based approach to broadcast news video story segmentation

Chengyuan Ma, Byungki Byun, Ilseo Kim, Chin-Hui Lee
2009 2009 IEEE International Conference on Acoustics, Speech and Signal Processing  
A detection-based paradigm decomposes a complex system into small pieces, solves each subproblem one by one, and combines the collected evidence to obtain a final solution. In this study of video story segmentation, a set of key events are first detected from heterogeneous multimedia signal sources, including a large scale concept ontology for images, text generated from automatic speech recognition systems, features extracted from audio track, and high-level video transcriptions. Then a
more » ... inative evidence fusion scheme is investigated. We use the maximum figure-of-merit learning approach to directly optimize the performance metrics used in system evaluation, such as precision, recall, and F1 measure. Some experimental evaluations conducted on the TRECVID 2003 dataset demonstrate the effectiveness of the proposed detectionbased paradigm. The proposed framework facilitates flexible combination and extensions of event detector design and evidence fusion to enable other related video applications.
doi:10.1109/icassp.2009.4959994 dblp:conf/icassp/MaBKL09 fatcat:wjayvk3t25b5jersaaamh4a2xy

Personalized Economy of Images in Social Forums: An Analysis on Supply, Consumption, and Saliency

Sangmin Oh, Megha Pandey, Ilseo Kim, Anthony Hoogs, Jeff Baumes
2014 2014 22nd International Conference on Pattern Recognition  
In this work, we focus on the novel problem of analyzing individual user's behavioral patterns regarding images shared on social forums. In particular, we view diverse user activities on social multimedia services as an economy, where the first activity mode of sharing or posting is interpreted as supply, and another mode of activity such as commenting on images is interpreted as consumption. To characterize user profiles in these two behavioral modes, we propose an approach to characterize
more » ... s' supply and consumption profiles based on the image content types with which they engage. We then present various statistical analyses, which confirm that there is an unexpected significant difference between these two behavioral modes. In addition, we introduce a statistical approach to identify users with salient profiles, which can be useful for social multimedia services for blocking users with undesirable behavior or viral content promotion. We showcase the benefits of the proposed saliency detection approach and its extension to detect significant key images from a complex dataset, which exhibits the inherent multi-modal nature of user bases of social multimedia services.
doi:10.1109/icpr.2014.351 dblp:conf/icpr/OhPKHB14 fatcat:7nwkm2kf4vbuxme642pre5fnmm

Image-oriented economic perspective on user behavior in multimedia social forums: An analysis on supply, consumption, and saliency

Sangmin Oh, Megha Pandey, Ilseo Kim, Anthony Hoogs
2016 Pattern Recognition Letters  
This work addresses the novel problem of analyzing individual user's behavioral patterns regarding images shared on social forums. In particular, we present an image-oriented economic perspective: the first activity mode of sharing or posting on social forums is interpreted as supply; and another mode of activity such as commenting on images is interpreted as consumption. First, we show that, despite the significant diversity, images in social forums can be clustered into semantically
more » ... groups using modern computer vision techniques. Then, users' supply and consumption profiles are characterized based on the distribution of images which they engage with. We then present various statistical analyses on real-world data, which show that there is significant difference between the images users supply and consume. This finding suggests that the flow of images on social network should be modeled as a bi-directional graph. In addition, we introduce a statistical approach to identify users with salient profiles. This approach can be useful for social multimedia services to block users with undesirable behavior or to identify viral content and promote it.
doi:10.1016/j.patrec.2015.08.022 fatcat:fd2twyvz4vayzgno7qd5kj7asq

Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach

Arash Vahdat, Kevin Cannons, Greg Mori, Sangmin Oh, Ilseo Kim
2013 2013 IEEE International Conference on Computer Vision  
We present a compositional model for video event detection. A video is modeled using a collection of both global and segment-level features and kernel functions are employed for similarity comparisons. The locations of salient, discriminative video segments are treated as a latent variable, allowing the model to explicitly ignore portions of the video that are unimportant for classification. A novel, multiple kernel learning (MKL) latent support vector machine (SVM) is defined, that is used to
more » ... ombine and re-weight multiple feature types in a principled fashion while simultaneously operating within the latent variable framework. The compositional nature of the proposed model allows it to respond directly to the challenges of temporal clutter and intra-class variation, which are prevalent in unconstrained internet videos. Experimental results on the TRECVID Multimedia Event Detection 2011 (MED11) dataset demonstrate the efficacy of the method.
doi:10.1109/iccv.2013.463 dblp:conf/iccv/VahdatCMOK13 fatcat:rgxl3u4kdnh5bmox2xdzxhxflq

Multimedia event detection with multimodal feature fusion and temporal concept localization

Sangmin Oh, Scott McCloskey, Ilseo Kim, Arash Vahdat, Kevin J. Cannons, Hossein Hajimirsadeghi, Greg Mori, A. G. Amitha Perera, Megha Pandey, Jason J. Corso
2013 Machine Vision and Applications  
We present a system for multimedia event detection. The developed system characterizes complex multimedia events based on a large array of multimodal features, and classifies unseen videos by effectively fusing diverse responses. We present three major technical innovations. First, we explore novel visual and audial features across multiple semantic granularities, including building, often in an unsupervised manner, mid-level and high-level features upon low-level features to enable semantic
more » ... erstanding. Second, we show a novel latent SVM model which learns and localizes discriminative high-level concepts in cluttered video sequences. In addition to improving detection accuracy beyond existing approaches, it enables a unique summary for every retrieval by its use of high-level concepts and temporal evidence localization. The resulting summary provides some transparency into why the system classified the video as it did. Finally, we present novel fusion learning algorithms and our methodology to improve fusion learning under limited training data condition. Thorough evaluation on a large TRECVID MED 2011 dataset showcases the benefits of the presented system.
doi:10.1007/s00138-013-0525-x fatcat:m5grko5ls5denhtst2btnwdmmy

TRECVID 2013 GENIE: Multimedia Event Detection and Recounting

Sangmin Oh, A. G. Amitha Perera, Ilseo Kim, Megha Pandey, Kevin J. Cannons, Hossein Hajimirsadeghi, Arash Vahdat, Greg Mori, Ben Miller, Scott McCloskey, You-Chi Cheng, Zhen Huang (+11 others)
2013 TREC Video Retrieval Evaluation  
Our MED 13 system is an extension of our MED 12 system [12, 13] , and consists of a collection of lowlevel and high-level features, feature-specific classifiers built upon those features, and a fusion system that combines features both through mid-level kernel fusion and late fusion. Our MED submissions include total of 24 different configurations which consist of combinations of 2 submission timings (PS/AH), 3 training conditions (100/10/0Ex), and 4 types of feature conditions
more » ... o/ASR). Our MER 13 submissions reported recounting for all five MER events. Our MER system combines evidences from multiple base classifiers, which are translated to texts and used to identify key frames. Multiple MER results are fused and presented to users as recounting for each detection.
dblp:conf/trecvid/OhPKPCHVMMMC0LX13 fatcat:5z3mqbkn2ra2rhznaofcd52gpm

TRECVID 2012 GENIE: Multimedia Event Detection and Recounting

A. G. Amitha Perera, Sangmin Oh, Megha Pandey, Tianyang Ma, Anthony Hoogs, Arash Vahdat, Kevin J. Cannons, Hossein Hajimirsadeghi, Greg Mori, Scott McCloskey, Ben Miller, Sharath Venkatesha (+12 others)
2012 TREC Video Retrieval Evaluation  
Our MED 12 system is an extension of our MED 11 system [12] , and consists of a collection of lowlevel and high-level features, feature-specific classifiers built upon those features, and a fusion system that combines features both through mid-level kernel fusion and score fusion. We have incorporated large number of audio-visual features in our new system and incorporated diverse types of standard and newly developed event agents which learn the salient audio-visual characteristics of event
more » ... sses. The combination of additional features and newly developed powerful event agents improve our MED performance substantially beyond our MED 11 results. In addition, our MER 12 submissions reported recounting of specified clips for all five MER events and additionally provided MER results for all the clips detected by MED system. Our MER system generated recounting of detections based on CDR features and synopsis provided as part of the EventKits and DEV-T datasets. The MER evaluation results are promising for event-level discrimination, and indicated further improvement to be made for clip-level discrimination.
dblp:conf/trecvid/PereraOPMHVCHMM12 fatcat:lond4yzi5vdnblryq73qo4gb4u

GENIE TRECVID 2011 Multimedia Event Detection: Late-Fusion Approaches to Combine Multiple Audio-Visual features

A. G. Amitha Perera, Sangmin Oh, Matthew J. Leotta, Ilseo Kim, Byungki Byun, Chin-Hui Lee, Scott McCloskey, Jingchen Liu, Ben Miller, Zhi Feng Huang, Arash Vahdat, Weilong Yang (+9 others)
2011 TREC Video Retrieval Evaluation  
For TRECVID 2011 MED task, the GENIE system incorporated two late-fusion approaches where multiple discriminative base-classiers are built per feature, then, combined later through discriminative fusion techniques. All of our fusion and base classiers are formulated as one-vs-all detectors per event class along with threshold estimation capabilities during cross-validation. Total of ve dierent types of features were extracted from data, which include both audio or visual features: HOG3D, Object
more » ... Bank, Gist, MFCC, and acoustic segment models (ASMs). Features such as HOG3D and MFCC are low-level features while Object Bank and ASMs are more semantic. In our work, event-specic feature adaptations or manual annotations were deliberately avoided, to establish a strong baseline results. Overall, the results were competitive in the MED11 evaluation, and shows that standard machine learning techniques can yield fairly good results even on a challenging dataset.
dblp:conf/trecvid/PereraOLKBLMLMH11 fatcat:hlk4ml4zabd5nkcno3ccvdjdau

A study on the applicability of newly developed stainless steel for weight reduction of carbody of intermodal tram
인터모달 트램 차체 경량화를 위한 신개발 스테인레스 강재 적용성 연구

Sung-il Seo, Jeong-guk Kim, Hyun-seung Jung
2016 Journal of the Korea Academia-Industrial cooperation Society  
감사의 글 서 승 일(Sung-ilSeo) [정회원]•1984년 2월 : 서울대학교 대학원  ... 
doi:10.5762/kais.2016.17.3.457 fatcat:frsqcnpjpncevopl4qq7i6o2ra

Metadata-Weighted Score Fusion for Multimedia Event Detection

Scott McCloskey, Jingchen Liu
2014 2014 Canadian Conference on Computer and Robot Vision  
The authors would like to specifically thank Ilseo Kim, Megha Pandy, Sangmin Oh, and Amitha Perera from Kitware; Arash Vahdat, Kevin Cannons, and Greg Mori from SFU; and You-Chi Cheng from Georgia Tech  ... 
doi:10.1109/crv.2014.47 dblp:conf/crv/McCloskeyL14 fatcat:qriolg7lzfhudpp5ds37qkbqpm
« Previous Showing results 1 — 15 out of 18 results