Watch a fixed timer create gridlock, then design the AI's observation space and train it to dramatically cut average waiting times โ just like smart city traffic systems!
The AI reads queue lengths, waiting times, and traffic flow rates at each approach to the intersection.
Every few seconds: which direction gets the green light? The AI picks the action that minimises total waiting time.
Reward = negative total waiting time. Lower wait = higher reward. The AI learns to keep queues short.
Google's DeepMind worked with Transport for London to use RL for traffic signals, cutting waiting times by 10โ20%.
How many cars are waiting on the North-South road (0โ10+)
How many cars waiting on the East-West road (0โ10+)
How long the oldest car on N-S has been waiting
How long the oldest car on E-W has been waiting
Which direction currently has green (N-S or E-W)
How long the current phase has been running (0โ60s)
You built and trained an adaptive traffic light AI!
Choosing what the agent observes is as important as the learning algorithm. Too little = can't make good decisions. Too much = Q-table explodes.
Real cities have thousands of traffic lights that affect each other. Multi-agent RL coordinates them simultaneously.
We reward the AI with the negative of total waiting time. Minimising waiting = maximising the reward.
Maximum throughput might starve one direction. Fairness constraints ensure no car waits more than a maximum time.
Google DeepMind applied RL to 70 London intersections, reducing stops by 10โ20% and cutting emissions.
Training in simulation doesn't always transfer to real intersections. Real traffic has pedestrians, emergencies, and unpredictable drivers.