TRUSTING SMART SPEAKERS: A TYPOLOGY OF INVOCATIONARY ACTS

Chris Chesher
2019 Selected Papers of Internet Research, SPIR  
Smart speakers such as the Google Home have the seemingly magical capacity to respond to user invocations in natural language. I argue that these are invocationary acts. In terms of Austin's speech act theory, smart speakers interpret what the user says (locutionary: speech-to-text), what their statement does (illocutionary: artificial intelligence), and attempt fulfil the obligation of the user's command (perlocutionary: AI & text-to-speech). The smart speaker responds with its own speech
more » ... in Searle's terms it might assert facts (representatives: e.g. answering a factual question), ask the user to do something (directive, e.g. asking a question in a quiz game) communicate a psychological state (expressive: e.g. answering the question 'Do you love me?'), commit to a future action (commissive: e.g setting a timer) or make a declaration (such as confirming a purchase). User invocations are most often directives, and are most often initiated with the 'wake word' 'Hey Google'. The computer's response comes automatically through what I call invocationary acts. In this case, the user's invocation is answered by the evocation of synthesised speech, sound, music and/or images. Drawing on an analysis of 300 commands drawn from online publications, I developed a typology of invocationary acts: Search, Lookup, Error, Media, Third party search, Location, User data, Random, Scripted response (often randomly selected from multiple answers), Interaction (applications such as a tutorial or a game), Device (controlling media, or smart home devices) and Clock. This analysis points to the limitations of the voice user interface paradigm.
doi:10.5210/spir.v2019i0.10935 fatcat:tyfd3u2vuvdm3dm3b7foxxc4ju