Combining Users' Activity Survey and Simulators to Evaluate Human Activity Recognition Systems

Gorka Azkune, Aitor Almeida, Diego López-de-Ipiña, Liming Chen
2015 Sensors  
Evaluating human activity recognition systems usually implies following 1 expensive and time consuming methodologies, where experiments with humans are run with 2 the consequent ethical and legal issues. We propose a novel evaluation methodology to 3 overcome the enumerated problems, which is based on surveys to users and a synthetic 4 dataset generator tool. Surveys allow capturing how different users perform activities of 5 daily living, while the synthetic dataset generator is used to create
more » ... properly labelled activity 6 datasets modelled with the information extracted from surveys. Important aspects such as 7 sensor noise, varying time lapses and user erratic behaviour can also be simulated using 8 the tool. The proposed methodology is shown to have very important advantages that 9 allow researchers carrying out their work more efficiently. To evaluate the approach, a 10 synthetic dataset generated following the proposed methodology is compared to a real dataset 11 computing the similarity between sensor occurrence frequencies. It is concluded that the 12 similarity between both datasets is more than significant. 13 Activity Survey 15 1. Introduction 16 Human activity recognition has become a very important research topic, since it is a key technology 17 in applications such as surveillance-based security [14] [7] [25], ambient assisted living [26] [20] [23], 18 Version March 24, 2015 submitted to Sensors 2 of 22 social robotics [9] and pervasive and mobile computing [6] [13]. Even though activity recognition is 19 very diverse in terms of sensing or monitoring approaches and algorithmic choices, evaluation is usually 20 carried out applying the following extensively used methodology: 21 1. Choose a target environment and deploy sensors to acquire and process information about human 22 activities. 23 2. Select a group of persons who can perform target activities in the prepared environment. 24 3. Select a dataset labelling system so datasets generated by users can be used as a ground truth. 25 4. Run experiments with users and label obtained activity datasets. 26 5. Use the same datasets to test the activity recognition system and store the labels produced by it. 27 6. Compare the labels of the activity recognition system with the ground truth using appropriate 28 metrics. 29 Each of the enumerated steps may vary depending on the activity recognition approach and the 30 available resources. The described methodology, which we refer to from now on in this paper as the 31 standard methodology, is the reference for any group working on human activity recognition. The main 32 advantages of the standard methodology are related to the realism, both of the collected data and the 33 behaviour of monitored people. If an activity modelling or recognition approach is validated by the 34 standard methodology, it can be claimed that it should have a similar performance in real world scenarios. 35 Nevertheless, there are some problems that make very difficult to implement the standard 36 methodology. For instance, (i) it is not always possible to own an environment and install sensors 37 and processing systems, due to economic reasons, (ii) running experiments with human beings imply 38 ethical and legal issues that can slow down the research process, and (iii) dataset labelling systems are 39 not perfect, since most of them rely on users' memory or discipline to annotate every activity carried out. 40 This paper presents a novel evaluation methodology to overcome the enumerated problems. The 41 methodology has been named hybrid because it combines real users' inputs with simulation tools. The 42 key idea is to circulate surveys among target users with the objective of capturing how they perform 43 certain activities of daily living. Using the information collected by the surveys, individual scripts are 44 prepared, which are then processed by a synthetic dataset generator tool to simulate arbitrary number 45 of days and generate perfectly labelled datasets of activities. To get as close as possible to real world 46 settings, the synthetic dataset generator uses probabilistic sensor noise models and probabilistic time 47 lapses. To enhance the usability of the tool for activity recognition researchers, a detailed methodology 48 has been elaborated and an intuitive script to model activities and behaviours is provided. 49 The paper is structured as follows: Section 2 shows the related work. Section 3 describes in detail 50 the proposed methodology. Section 4 outlines the survey designed to capture how different users 51 perform activities of daily living, while Section 5 presents the synthetic dataset generator tool developed 52 to implement the hybrid methodology. Section 7 discusses the advantages and disadvantages of the 53 proposed methodology. Finally, Section 8 presents the conclusions and provides some insights for future 54 work. 55 Evaluation methodologies for activity recognition systems are usually explained in research papers 57 whose objective is to present contributions related to activity recognition rather than justifying or 58 validating proposed methods. There are many papers that follow the standard methodology introduced 59 in Section 1, such as [21], [17] or [22]. Other authors use public datasets provided by research groups 60 which own pervasive environments and share the collected data. That is the case of [12] and [2]. The 61 major drawback of such an approach is that those datasets cannot be controlled by researchers and that 62 they may not be appropriate for specific objectives. 63 A common problem shared by those methodologies refer to dataset labelling methods. Many research 64 papers show experimental methodologies where participants have to manually annotate the activities they 65 are performing (see [22], [20] and [18]). Wren et al. [24] show experiments where an expert had to go 66 through raw sensor data to find activities and annotate them. Manual annotation methods are prone to 67 human errors, which result in imperfect ground truth datasets. 68 There are some alternative methods to manual annotation. For instance, Kasteren and Noulas [17] 69 present a novel method that implies the use of a bluetooth headset equipped with speakers in order to 70 capture the voice of the participant. While performing an activity, the participant has to name the activity 71 itself. A different approach is presented by Huynh et al. [16]. They provide three annotation methods: 72 a mobile phone application, typical manual annotation and another mobile phone application to take 73 pictures regularly and help researchers manually label the activities. 74 Even though there might be problems when following the standard evaluation methodology, it is 75 clear that it is the best methodology in order to assess the performance of an activity modelling and/or 76 recognition system. However, as Helal et al. [11] state in their paper: 77 Access to meaningful collections of sensory data is one of the major impediments in human 78 activity recognition research. Researchers often need data to evaluate the viability of their 79 models and algorithms. But useful sensory data from real world deployments of pervasive 80 spaces are very scarce. This is due to the significant cost and elaborate groundwork needed 81 to create actual spaces. Additionally, human subjects are not easy to find and recruit. Even 82 in real deployments, human subjects cannot be used extensively to test all scenarios and 83 verify multitudes of theories. Rather, human subjects are used to validate the most basic 84 aspects of the pervasive space and its applications, leaving many questions unanswered and 85 theories unverified. 86 The solution provided by Helal et al. [11] is to develop advanced simulation technologies in order to 87 be able to generate realistic enough synthetic datasets. Indeed, they develop a simulator called Persim, 88 which has been enhanced in the new version Persim-3D [10]. Persim is an event driven simulator 89 of human activities in pervasive spaces. Persim is capable of capturing elements of space, sensors, 90 behaviours (activities), and their inter-relationships. Persim is becoming a very complete simulator tool 91 for activity recognition in pervasive environments. However, it is still under development and one of 92 its main limitations is that it does not provide a way to model realistically human behaviour. Authors 93 have already identified this limitation and they are currently working on programming by demonstration 94 approaches to overcome the problem. Following those ideas, simulation tools have already been used for activity recognition by other 96 researchers. For example, Okeyo et al. [19] use a synthetic data generator tool to simulate time intervals 97 between sensor activations. Their research is focused on sensor data stream segmentation, so the tool 98 generates varying patterns of sensor activations in order to verify their approach. Liao et al. [18] combine 99 simulation tools and real data for activity recognition. A more elaborated simulator has been developed 100 by Bruneau et al. [3]: DiaSim. The DiaSim simulator executes pervasive computing applications by 101 creating an emulation layer and developing simulation logic using a programming framework. However, 102 it is more focused on simulating applications such as fire situations, intrusions and so on to identify 103 potential conflicts. In consequence, DiaSim cannot be directly applied to activity recognition. 104 As can be seen in the literature review, simulation tools can be used for activity recognition, since 105 they provide accurate enough datasets to verify some theories. However, none of the references given 106 above specify a sound methodology to use simulators to evaluate activity recognition approaches. There 107 is no information about how activities should be defined, how different users can be modelled, sensor 108 error models and so forth, which are key issues when using a simulator. Therefore, there is a lack of a 109 sound methodology that addresses the usage of simulation tools for activity recognition evaluation. 110 This paper proposes a novel evaluation methodology. The first phase is devoted to capture user activity 111 and behaviour using surveys, which are subsequently used in the second phase, where a synthetic data 112 generator is used. As the proposed methodology combines surveys to users to capture their behaviour 113 with simulation tools, it is called hybrid evaluation methodology. 114 3. The Hybrid Evaluation Methodology 115 The hybrid evaluation methodology has been specially designed for activity recognition systems 116 which assume the dense sensing paradigm introduced by Chen et al. [4], where an action of a user 117 interacting with an object is detected through the sensor attached to the object. Even though the 118 methodology itself is not limited to specific scenarios, the implementation presented in this paper works 119 for single user -single activity scenarios, i.e. only one user is considered and concurrent or interleaved 120 activities are not taken into account. 121 The methodology has been named hybrid because it combines real users' inputs and simulation tools. 122 The key idea is to circulate surveys among target users with the objective of capturing how they perform 123 certain activities of daily living. Additionally, users are also requested to describe how their days are in 124 terms of defined activities. For example, a user might make a coffee and brush her teeth in week days 125 between 7:00 and 7:30 AM. So the aim of those surveys is to model real human behaviour, covering one 126 of the major weaknesses of simulation-based evaluation methodologies. Using the information collected 127 by surveys, individual scripts are prepared, which are then processed by a synthetic dataset generator 128 tool to simulate arbitrary number of days and generate perfectly labelled datasets of activities. To get 129 as close as possible to real world settings, the synthetic dataset generator uses probabilistic sensor noise 130 models and probabilistic time lapses. 131 Based on those constraints and ideas, the proposed hybrid evaluation methodology has the following 132 steps (see Figure 1 ):
doi:10.3390/s150408192 pmid:25856329 pmcid:PMC4431271 fatcat:zbtmccdrb5bdnm7a6ilva3eiv4