A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2014; you can also visit the original URL.
The file type is
We describe a novel method for directing the attention of an automated surveillance system. Our starting premise is that the attention of people in a scene can be used as an indicator of interesting areas and events. To determine people's attention from passive visual observations we have developed a system which automatically locates and tracks pedestrians in surveillance-style video before measuring their head pose as an estimate of their gaze direction. We then demonstrate how the resultingdoi:10.5244/c.23.14 dblp:conf/bmvc/BenfoldR09 fatcat:o5ainfv475b4tns642hrretjam
more »... aze estimations can be used to identify the subject of interest in three different surveillance scenarios. The first step of processing requires the pedestrians in a scene to be tracked, with the purpose of providing stable head images for the following pose estimation step. In contrast to similar systems, we have developed a robust multi-person tracking system that does not rely on background subtraction, making it capable of tracking the heads of multiple pedestrians through complex environments where occlusions are frequent. We track only the heads of pedestrians rather than their entire bodies for two reasons. The first is that security cameras are generally positioned sufficiently high to allow pedestrian's faces to be seen, so their heads are rarely obscured. The second is that the offset between the centre of a pedestrian's body and their head changes as they walk, so tracking the head directly provides more accurately positioned head images. The head tracking algorithm combines absolute location estimates from a head detector with velocity estimates from feature-based tracking to provide stable head images for the subsequent pose estimation step. A head detector was trained using the Histogram of Oriented Gradients based method of Dalal and Triggs  to provide absolute position estimates. The velocity measurements were made by tracking a number of corner features [1, 3] and learning which were representative of the head velocity using a dynamic Bayesian network. The individual feature velocity estimates were then probabilistically combined to give robust velocity estimates for the head. The two types of measurement were combined using a Kalman filter with the process model, which usually predicts the next state based on physics, replaced with the velocity estimations from feature tracking. Using a Kalman filter allows the two types of measurement to be combined probabilistically and additionally the covariance can be used to limit the region in which the detector needs to be applied. The next stage of processing uses the stable head regions provided by the tracking to estimate the direction in which the person is facing. Randomised ferns, a type of randomised tree classifier, were trained using labelled head images and used to estimate the probability that a given head image belonged to each of eight direction classes. The decisions in the ferns were based on two types of comparison, both of which were designed to be robust against contrast and brightness variations. The first decision type was based on the same HOG features that Dalal and Triggs used to train human detectors and the second was based on a comparison of colours sampled at different locations within the head region. The tracking and head pose estimation were combined to make a fully automatic system (figure 1) which could be used to measure the amount of attention received by different areas of a scene. When applied to video sequences, the direction estimates from the randomised ferns were smoothed using a hidden Markov model to enforce temporal constraints. Using a GPU implementation of the HOG head detector, the complete system runs at 15fps on 640×480 video. For three different video sequences, the locations and gaze directions of the pedestrians were projected onto a 2D ground plane and used to build up an attention map representing the amount of attention received by each square metre of the ground. In the first two experiments, static regions receiving attention were identified by accumulating gaze estimates over a long period of time. The third experiment involved locating a transient subject of attention by combining gaze estimates from multiple people, the results of which are shown in figure 2 . The results demonstrate that the system is capable of both automatically tracking a number of pedestrians in the presence of occlusions and Figure 1: A frame showing the gaze direction estimates and the paths along which pedestrians were tracked. Figure 2: Sequence showing how the attention map can be used to highlight transient areas of interest. The left column shows video frames with annotated gaze directions, the middle column shows the corresponding attention maps and the third column shows the video frame modulated with the projected attention map estimating the amount of attention that the pedestrians give to different areas of the scene.
The majority of existing pedestrian trackers concentrate on maintaining the identities of targets, however systems for remote biometric analysis or activity recognition in surveillance video often require stable bounding-boxes around pedestrians rather than approximate locations. We present a multi-target tracking system that is designed specifically for the provision of stable and accurate head location estimates. By performing data association over a sliding window of frames, we are able todoi:10.1109/cvpr.2011.5995667 dblp:conf/cvpr/BenfoldR11 fatcat:2ird3mbrrbhzpnkig5ga7cflma
more »... rrect many data association errors and fill in gaps where observations are missed. The approach is multi-threaded and combines asynchronous HOG detections with simultaneous KLT tracking and Markov-Chain Monte-Carlo Data Association (MCM-CDA) to provide guaranteed real-time tracking in high definition video. Where previous approaches have used ad-hoc models for data association, we use a more principled approach based on MDL which accurately models the affinity between observations. We demonstrate by qualitative and quantitative evaluation that the system is capable of providing precise location estimates for large crowds of pedestrians in real-time. To facilitate future performance comparisons, we will make a new dataset with hand annotated ground truth head locations publicly available.
We present a method to estimate the coarse gaze directions of people from surveillance data. Unlike previous work we aim to do this without recourse to a large handlabelled corpus of training data. In contrast we propose a method for learning a classifier without any hand labelled data using only the output from an automatic tracking system. A Conditional Random Field is used to model the interactions between the head motion, walking direction, and appearance to recover the gaze directions anddoi:10.1109/iccv.2011.6126516 dblp:conf/iccv/BenfoldR11 fatcat:yhbi26kgdngyraclvaplsrnklu
more »... imultaneously train randomised decision tree classifiers. Experiments demonstrate performance exceeding that of conventionally trained classifiers on two large surveillance datasets.
STATIC CAMERA TRACKING AND COARSE GAZE ESTIMATION The static camera tracker uses the approach of Benfold and Reid  who tracked the heads of pedestrians using a combination of sparse optical flow measurements ...doi:10.1109/icra.2011.5979585 dblp:conf/icra/SommerladeBR11 fatcat:swkjasmbqbbbfjkxwmdbehwnye
Cognitive visual tracking is the process of observing and understanding the behaviour of a moving person. This paper presents an efficient solution to extract, in real-time, high-level information from an observed scene, and generate the most appropriate commands for a set of pan-tilt-zoom (PTZ) cameras in a surveillance scenario. Such a high-level feedback control loop, which is the main novelty of our work, will serve to reduce uncertainties in the observed scene and to maximize the amount ofdoi:10.1016/j.cviu.2011.09.011 fatcat:yehx5wf555gdzcvjdk3nn5qxtm
more »... information extracted from it. It is implemented with a distributed camera system using SQL tables as virtual communication channels, and Situation Graph Trees for knowledge representation, inference and high-level camera control. A set of experiments in a surveillance scenario show the effectiveness of our approach and its potential for real applications of cognitive vision.
We describe an architecture for a multi-camera, multi-resolution surveillance system. The aim is to support a set of distributed static and pan-tilt-zoom (PTZ) cameras and visual tracking algorithms, together with a central supervisor unit. Each camera (and possibly pan-tilt device) has a dedicated process and processor. Asynchronous interprocess communications and archiving of data are achieved in a simple and effective way via a central repository, implemented using an SQL database. Visualdoi:10.1109/icdsc.2009.5289413 dblp:conf/icdsc/BellottoSBBRRTGS09 fatcat:zvhpco554fd2tc3ld43c3myghu
more »... cking data from static views are stored dynamically into tables in the database via client calls to the SQL server. A supervisor process running on the SQL server determines if active zoom cameras should be dispatched to observe a particular target, and this message is effected via writing demands into another database table. We show results from a real implementation of the system comprising one static camera overviewing the environment under consideration and a PTZ camera operating under closed-loop velocity control, which uses a fast and robust level-set-based region tracker. Experiments demonstrate the effectiveness of our approach and its feasibility to multi-camera systems for intelligent surveillance.
Benfold concludes with the present week, but fresh attractions will compensate the visitors. In addition to Mr. Benfold, Signor Correlli and his infant sons have been performing every evening. ... Egeiton Wilks’s drama of “ Ben the Boatswain,’’ has drawn very good houses. Lerps.—Princess’s.—“ Virginius,” the “ Mys- teries of? ...
Benfold-Tracker. ... (PirsiavashTracker) , Yang and Nevatia (YangTracker) , Benfold and Reid (Benfold-Tracker) , and Poiesi et al. (PoiesiTracker) . ...doi:10.1007/s11760-017-1086-7 fatcat:y2x7powt6ngxjj223d4mgpbd2u
Research Paper Precision Accuracy Ben Benfold et.al 73.6% 59.9% Breitenstein et.al 67.0% 78.1% Anton Milan et.al 87.2% 66.4% Yi Yang et.al - 78.5% Nayyab Naseem 72.5% 78% ...arXiv:1506.06659v1 fatcat:cdrpzcoxxbfwrjxtije4jc3pou
This method is extended with an appearance model by Ben Shitrit et al.  . ... Benfold and Reid  use a HOG based head detector to detect heads from a bird's-eye-view camera perspective and extrapolate full body detections using a fixed ground plane. ...doi:10.1016/j.cviu.2014.06.003 fatcat:vk4ceu54urec7bcrqjpercd6qe
Optics and Photonics for Counterterrorism, Crime Fighting and Defence IX; and Optical Materials and Biomaterials in Security and Defence Systems Technology X
Inspired by the work Benfold and Reid  , our system tracks human attention by making a rough estimate of the gaze direction for each person in the scene. ...doi:10.1117/12.2031639 fatcat:fch35biedbab5fcsskvx4lafia
International Journal of Business Management & Research (IJBMR)
http://atimes.com/2016/07/why-vietnam-might-need-to-embrace-shamefare-in-the-south-china-sea/ 36 Shen, Lu, and Ben Westcott. 2016. ... countries may be bluffing, at least in part, on the South China Sea is evidenced by the resumption of joint naval exercises with China's Northern Fleet at Qingdao by the guided missile destroyer USS Benfold ...fatcat:skxaaxshmfh3pngwlhpnku2gdy
Auf [RR06] setzt hingegen das System von Benfold und Reid aus dem Jahr 2008 auf [BR08]. ... Ein Verfahren, dass direkt von der Kopfdrehung auf Aufmerksamkeitsziele schließt, beschrei- ben zum Beispiel Murphy-Chutorian und Trivedi in [MCT08a]. ...doi:10.5445/ir/1000024787 fatcat:65bzdd5hurhx5ldv7vhvsh7oce
In this context, Benfold and Reid  built upon evidence from the estimated head poses of large crowds to guide a visual surveillance system towards interesting points. C. ... Method Proposed by Olfa Ben Ahmed She used the Region of Interest (ROI) to extract the hippocampus and cingulate cortex. For the classification step, it uses the Bag of Visual World (BOVW) method. ...fatcat:mwmaociquzadheb7zpticulcla