Deep RL for Intelligent Traffic Light Control
The Environment
We use the SUMO Traffic Simulator to model our intersection. Each simulation episode runs for 5400 steps, and each step is 1 second. Vehicles are generated following a Weibull distribution. For each generated vehicle, the source and destination arms are chosen randomly.
For the high-traffic scenario, we have:
- 1000 cars approaching the intersection from each arm (evenly distributed).
- 75% of these vehicles go straight.
- The remaining 25% turn either left or right.
Below is an image illustrating the intersection we are modeling:

Intersection Layout
Comparing DQN and PPO
We compare two Deep Reinforcement Learning algorithms: DQN and PPO.


DQN (left) vs. PPO (right) traffic light control simulations.
Observations:
PPO achieves a shorter average delay much faster than DQN. Over time, however, both algorithms ultimately converge to similar average delay values by the end of training.

Average delay evolution during training.