Temporal difference learning

Andrew Barto
2007 Scholarpedia  
doi:10.4249/scholarpedia.1604 fatcat:7yhrvmeoffd4zmvmxfoidocw4y