A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is
This deliverable describes the second development iteration of the joint collection of libraries and tools for multimodal content analysis from AALTO, EURECOM, INA, Lingsoft, LLS and Limecraft. Based on the methods' primary input domain, they have been grouped as visual (facial person recognition, facial gender classification and video description), auditory (speech and gender segmentation, speech recognition and speaker identification and diarisation) and multimodal (audio-enhanced captioning,doi:10.5281/zenodo.4964298 fatcat:6bbqa7q3xrctnm6nrf5fxh7f3q