TRUSTING SMART SPEAKERS: A TYPOLOGY OF INVOCATIONARY ACTS
Selected Papers of Internet Research, SPIR
Smart speakers such as the Google Home have the seemingly magical capacity to respond to user invocations in natural language. I argue that these are invocationary acts. In terms of Austin's speech act theory, smart speakers interpret what the user says (locutionary: speech-to-text), what their statement does (illocutionary: artificial intelligence), and attempt fulfil the obligation of the user's command (perlocutionary: AI & text-to-speech). The smart speaker responds with its own speech
... in Searle's terms it might assert facts (representatives: e.g. answering a factual question), ask the user to do something (directive, e.g. asking a question in a quiz game) communicate a psychological state (expressive: e.g. answering the question 'Do you love me?'), commit to a future action (commissive: e.g setting a timer) or make a declaration (such as confirming a purchase). User invocations are most often directives, and are most often initiated with the 'wake word' 'Hey Google'. The computer's response comes automatically through what I call invocationary acts. In this case, the user's invocation is answered by the evocation of synthesised speech, sound, music and/or images. Drawing on an analysis of 300 commands drawn from online publications, I developed a typology of invocationary acts: Search, Lookup, Error, Media, Third party search, Location, User data, Random, Scripted response (often randomly selected from multiple answers), Interaction (applications such as a tutorial or a game), Device (controlling media, or smart home devices) and Clock. This analysis points to the limitations of the voice user interface paradigm.