Analytically Guided Reinforcement Learning for Green IT and Fluent Traffic

Marcin Korecki, Dirk Helbing
This study investigates various methods for autonomous traffic signal control. We look into different types of control methods, including fixed time, adaptive, analytic, and reinforcement learning approaches. Machine learning approaches are compared with the "analytic" approach, which is used as "gold standard" for performance assessment. We find that conventional machine learning approaches are better than the analytic approach, but require a lot more computer power. We, therefore, introduce a
more » ... novel hybrid method called "analytically guided reinforcement learning" or shorter "α-RL". This approach is implemented in our "GuidedLight agent" and tends to outperform both, classical machine learning and the analytic approach, while largely improving convergence. This method is therefore suited as a "green IT" solution that improves environmental impact in a two-fold way: by reducing (i) traffic congestion and (ii) the processing power needed for the learning and operation of the traffic light control algorithm. 11 12 44 • highlight the performance and limitations of machine 45 learning approaches considering ecological issues, 46 • propose an improved, hybrid machine learning approach 47 called "analytically guided reinforcement learning" or 48 "α-RL", which converges much more quickly than con-49 ventional machine learning methods. 50 131 III. RELATED WORK 132 In this section we will discuss relevant related work. 133 A. FIXED TIME CONTROL 134 A classical method of traffic control is to generate central-135 ized schedules, which are imposed on all intersection in the 136 city [2]. In its simplest form each intersection cycles through 137 all its phases with no off-sets. Each intersection at a given 138 time has the same phases, and each phase is given the same 139 amount of time. We refer to this simplistic method as Fixed 140 Time Control. More advanced versions of this method include 141 the implementation of different green times periods for each 142 phase and suitably calibrated off-sets [2]. 143 B. ADAPTIVE METHODS 144 A typical adaptive method is able to select the next phase 145 based on the current state of the intersection controlled. One 146 of the simplest adaptive methods is "demand-based" control. 147 This approach adapts its actions based on the "demand of 148 a phase", which is defined as the sum of the demands of 149 all movements belonging to the phase. The "demand of 150 a movement" corresponds to the number of cars that are 151 present on all incoming lanes belonging to the movement.
doi:10.3929/ethz-b-000570566 fatcat:2n7mpmip3ffqzocfmoqsrf64la