10,914 Hits in 3.6 sec

An Information-Geometric Approach to Real-Time Audio Segmentation

Arnaud Dessein, Arshia Cont
2013 IEEE Signal Processing Letters  
To cite this version: Arnaud Dessein, Arshia Cont. An information-geometric approach to real-time audio segmentation.  ...  However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to The  ...  An Information-Geometric Approach to Real-Time Audio Segmentation Arnaud Dessein, and Arshia Cont Abstract-We present a generic approach to real-time audio segmentation in the framework of information  ... 
doi:10.1109/lsp.2013.2247039 fatcat:lvemxmvlyrcwxgxfyx467jjjnq

ISNN: Impact Sound Neural Network for Audio-Visual Object Classification [chapter]

Auston Sterling, Justin Wilson, Sam Lowe, Ming C. Lin
2018 Lecture Notes in Computer Science  
We further present an interactive application for real-time scene reconstruction in which a user can strike objects, producing sound that can instantly classify and segment the struck object, even if the  ...  Our audio-visual network (ISNN-AV) combines ISNN-A with VoxNet to produce state-of-the-art object classification accuracy.  ...  Based on these, the real-time 3D reconstruction [6, 7] is enhanced and segmented.  ... 
doi:10.1007/978-3-030-01267-0_34 fatcat:odom3dsduncplmckvw5fhewuay

Workflow for Integrated Object Detection in Collaborative Video Annotation Environments [chapter]

Lars Grunewaldt, Kim Möller, Karsten Morisse
2006 Lecture Notes in Computer Science  
Based on a former approach, some new development like integrated audio conferencing with recording facilities for audio based work instructions and an automatized video segmentation module are presented  ...  Beside these technical improvements an object based approach for the video annotation process is presented.  ...  An Approach for Object Based Video Annotation Recognizing objects in 2D pictures is time consuming and, without any additional information also inaccurate and erroneous.  ... 
doi:10.1007/11758525_76 fatcat:h6voypj5ijf3dahi7c6olizxoe

2D/3D AudioVisual content analysis & description

I. Pitas, K. Papachristou, N. Nikolaidis, M. Liuni, L. Benaroya, G. Peeters, A. Roebel, A. Linnemann, M. Liu, S. Gerke
2014 2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)  
The European Union is not liable for any use that may be made of the information contained herein.  ...  ACKNOWLEDGMENT The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement number 287674 (3DTVS).  ...  an object over time.  ... 
doi:10.1109/mmsp.2014.6958837 dblp:conf/mmsp/PitasPNLBPRLLG14 fatcat:wf5evhmzyzekbkh7fv2evrfq3m

Immersive Spatial Audio Reproduction for VR/AR Using Room Acoustic Modelling from 360° Images

Hansung Kim, Luca Hernaggi, Philip J.B. Jackson, Adrian Hilton
2019 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR)  
In order to maximise the immersiveness of user in VR/AR environments, a plausible spatial audio reproduction synchronised with visual information is essential.  ...  Spatially synchronised audio is reproduced based on the estimated geometric and acoustic properties in the scene.  ...  In a real environment, the whole process from camera setting to the final model output can be done within half an hour, which is much simpler and faster than audio-based approaches.  ... 
doi:10.1109/vr.2019.8798247 dblp:conf/vr/KimHJH19 fatcat:ncdt6poinbdvpfryg7kyukp3oi

Human-centered 2D/3D video content analysis and description

K. Papachristou, N. Nikolaidis, I. Pitas, A. Linnemann, M. Liu, S. Gerke
2014 8th International Conference on Electrical and Computer Engineering  
The European Union is not liable for any use that may be made of the information contained herein.  ...  ACKNOWLEDGMENT The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement number 287674 (3DTVS).  ...  an object over time.  ... 
doi:10.1109/icece.2014.7026818 fatcat:selbhg6pdjbgdhr42mgsfmnjya

Multi-Modal Localization and Enhancement of Multiple Sound Sources from a Micro Aerial Vehicle

Ricardo Sanchez-Matilla, Lin Wang, Andrea Cavallaro
2017 Proceedings of the 2017 ACM on Multimedia Conference - MM '17  
To address this problem, we propose a multi-modal analysis approach that jointly exploits audio and video data to enhance the sounds of multiple targets captured from an MAV equipped with a microphone  ...  We irst perform audiovisual calibration via camera resectioning, audio-visual temporal alignment and geometrical alignment to jointly use the features in the audio and video streams, which are independently  ...  The visually-informed audio enhancement approach consists of ive steps.  ... 
doi:10.1145/3123266.3123412 dblp:conf/mm/Sanchez-Matilla17 fatcat:gd4iptvcizb7dnjdi47cpfv5qu

Content-Based Video Description for Automatic Video Genre Categorization [chapter]

Bogdan Ionescu, Klaus Seyerlehner, Christoph Rasche, Constantin Vertan, Patrick Lambert
2012 Lecture Notes in Computer Science  
In this paper, we propose an audio-visual approach to video genre categorization. Audio information is extracted at block-level, which has the advantage of capturing local temporal information.  ...  An extensive evaluation of this multi-modal approach based on on more than 91 hours of video footage is presented.  ...  Existing approaches are limited to use standard audio features, e.g. a common approach is to use Mel-Frequency Cepstral Coefficients (MFCC) or to compute time domain features, e.g.  ... 
doi:10.1007/978-3-642-27355-1_8 fatcat:pq6xslvmi5ffpidqq2k7br6vci

Automatic visual feature extraction for Mandarin audio-visual speech recognition

Tsang-Long Pao, Wen-Yuan Liao, Tsan-Nung Wu, Ching-Yi Lin
2009 2009 IEEE International Conference on Systems, Man and Cybernetics  
In this paper, we proposed an automatic visual feature extraction approach to extract the visual features of the lips that can be used in the audio-visual speech recognition system.  ...  These features are important to the recognition system, especially in noisy condition. The segmentation of the lip region uses both color and edge information.  ...  Firstly, an accurate and robust audio and visual speech feature extraction algorithm needs to be developed. Secondly, fusion method of the two separate information sources needs to be designed.  ... 
doi:10.1109/icsmc.2009.5346011 dblp:conf/smc/PaoLWL09 fatcat:f7tchwo5dbhjlo5t34twejnkrm

Teaching multimedia

Iliya Georgiev
2003 Proceedings of the 4th international conference conference on Computer systems and technologies e-Learning - CompSysTech '03  
The paper describes a generalized approach to multimedia teaching that is designed to present a multimedia-processing model and the basics of the related disciplines.  ...  Teaching and learning multimedia basics are relevant to a diversity of scientific areas: signals and systems, data compression, computer graphics, image processing and understanding, digital audio and  ...  For example, a segmentation coding approach that considers an image as an assembly of many regions and encodes the contour and texture of each region separately, can efficiency explit the structural redundancy  ... 
doi:10.1145/973620.973721 fatcat:7uhoirls4fhr3mvagsvktkiewa

Perceptual Segment Clustering For Music Description And Time-Axis Redundancy Cancellation

Tristan Jehan
2004 Zenodo  
We base our segmentation on an auditory model. Its goal is to remove the information that is the least critical to our hearing sensation, while retaining the important parts.  ...  We believe that such approach has potential both in the music information retrieval, and the perceptual audio coding domains. Figure 1 . 1 Figure 1.  ... 
doi:10.5281/zenodo.1416854 fatcat:cbgv5wt5fjd4la2zkuuokz4wpm

Guest Editorial: Content Analysis and Indexing for Advanced Multimedia Services

Alberto Messina, Andrea Basso, Werner Bailer
2015 Multimedia tools and applications  
In addition, it is of paramount importance to develop the ability to generate, represent and distribute such informational units (e.g., indexes) in a way that is consumable and manageable by a wide range  ...  Authors of BMulti-modal fusion for associated news story retrieval^[6] investigate multimodal approaches to retrieve associated news stories sharing the same main topic.  ...  to enable an efficient and precise real-time selection.  ... 
doi:10.1007/s11042-015-2540-6 fatcat:nn6obv6el5dddpm3mhger4xxre

Next-Generation Augmented Reality Browsers: Rich, Seamless, and Adaptive

Tobias Langlotz, Thanh Nguyen, Dieter Schmalstieg, Raphael Grasset
2014 Proceedings of the IEEE  
This paper discusses the challenges and varying research approaches to building augmented reality browsers to discover and view content related to physical objects around a mobile device.  ...  ABSTRACT | As low-level hardware will soon allow us to visualize virtual content anywhere in the real world, managing it in a more structured manner still needs to be addressed.  ...  Acknowledgment The authors would like to thank H. Regenbrecht for his input on several of the projects presented in this paper.  ... 
doi:10.1109/jproc.2013.2294255 fatcat:p6gwktzag5cdhkdgss3xtuwede

Automatic Detection and Classification of Audio Events for Road Surveillance Applications

Noor Almaadeed, Muhammad Asim, Somaya Al-Maadeed, Ahmed Bouridane, Azeddine Beghdadi
2018 Sensors  
Thus, a preferred approach for reduction in road traffic death rate, is to decrease the unnecessary delays in information to reach the emergency responders [6] .  ...  An audio analysis system in such an environment faces a high level of non-stationary background noise in addition to potentially relevant sound events.  ...  Ahmed Bouridane and Azeddine Beghdadi reviewed the approach and the results to further improve the quality of the paper.  ... 
doi:10.3390/s18061858 pmid:29882825 pmcid:PMC6022152 fatcat:lf27ewcnwfgdpdldwkt5mvs6kq

Audio Fingerprinting for Multi-Device Self-Localization

Tsz-Kin Hon, Lin Wang, Joshua D. Reiss, Andrea Cavallaro
2015 IEEE/ACM Transactions on Audio Speech and Language Processing  
The obtained inter-device distances are then exploited to derive the geometrical configuration of the network.  ...  Index Terms-Ad-hoc microphone array, audio fingerprinting, multi-source, self-localization, time difference of arrival (TDOA) estimation.  ...  Special hardware is used to tackle time misalignment and to ensure real-time signal sending/receiving [28] .  ... 
doi:10.1109/taslp.2015.2442417 fatcat:4dwjeg2kdfevve5dtnh7cb7oam
« Previous Showing results 1 — 15 out of 10,914 results