Chris Chesher
2019 Selected Papers of Internet Research, SPIR  
Smart speakers such as the Google Home have the seemingly magical capacity to respond to user invocations in natural language. I argue that these are invocationary acts. In terms of Austin's speech act theory, smart speakers interpret what the user says (locutionary: speech-to-text), what their statement does (illocutionary: artificial intelligence), and attempt fulfil the obligation of the user's command (perlocutionary: AI & text-to-speech). The smart speaker responds with its own speech
more » ... in Searle's terms it might assert facts (representatives: e.g. answering a factual question), ask the user to do something (directive, e.g. asking a question in a quiz game) communicate a psychological state (expressive: e.g. answering the question 'Do you love me?'), commit to a future action (commissive: e.g setting a timer) or make a declaration (such as confirming a purchase). User invocations are most often directives, and are most often initiated with the 'wake word' 'Hey Google'. The computer's response comes automatically through what I call invocationary acts. In this case, the user's invocation is answered by the evocation of synthesised speech, sound, music and/or images. Drawing on an analysis of 300 commands drawn from online publications, I developed a typology of invocationary acts: Search, Lookup, Error, Media, Third party search, Location, User data, Random, Scripted response (often randomly selected from multiple answers), Interaction (applications such as a tutorial or a game), Device (controlling media, or smart home devices) and Clock. This analysis points to the limitations of the voice user interface paradigm.
doi:10.5210/spir.v2019i0.10935 fatcat:tyfd3u2vuvdm3dm3b7foxxc4ju