D2.2 Implementations of methods adapted to enhanced human inputs

Doukhan, Francis, Harrando, Huet, Kaseva, Kurimo, Laaksonen, Lindh-Knuutila, Lisena, Pehlivan Tort, Reboud, Rouhe (+2 others)
2020 Zenodo  
This deliverable describes the second development iteration of the joint collection of libraries and tools for multimodal content analysis from AALTO, EURECOM, INA, Lingsoft, LLS and Limecraft. Based on the methods' primary input domain, they have been grouped as visual (facial person recognition, facial gender classification and video description), auditory (speech and gender segmentation, speech recognition and speaker identification and diarisation) and multimodal (audio-enhanced captioning,
more » ... visual–auditory gender classification, person re-identification and multimodal speech recognition) approaches in this report. Special attention has been on methods that combine different modalities and bring human knowledge as input to the learning system. As part of this deliverable, the existing open source components gathered into a joint software collection of tools and libraries have been updated and new components have been added. This deliverable also summarises in an appendix the dissemination activities related to the research work in MeMAD's Work Package WP2 during its second year. Finally, the abstracts of five academic theses together with full texts of ten scientific publications appear at the end of the report. These appendices describe the technological advances related to the software components of MeMAD Task T2.2 in further detail.
doi:10.5281/zenodo.4964298 fatcat:6bbqa7q3xrctnm6nrf5fxh7f3q