A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
Next generation virtual assistants are envisioned to handle multimodal inputs (e.g., vision, memories of previous interactions, in addition to the user's utterances), and perform multimodal actions (e.g., displaying a route in addition to generating the system's utterance). We introduce Situated Interactive MultiModal Conversations (SIMMC) as a new direction aimed at training agents that take multimodal actions grounded in a co-evolving multimodal input context in addition to the dialogarXiv:2006.01460v2 fatcat:yfijtft2v5hjjkwg5nv6wvwcey