Deep RL for Intelligent Traffic Light Control

The Environment

We use the SUMO Traffic Simulator to model our intersection. Each simulation episode runs for 5400 steps, and each step is 1 second. Vehicles are generated following a Weibull distribution. For each generated vehicle, the source and destination arms are chosen randomly.

For the high-traffic scenario, we have:

  • 1000 cars approaching the intersection from each arm (evenly distributed).
  • 75% of these vehicles go straight.
  • The remaining 25% turn either left or right.

Below is an image illustrating the intersection we are modeling:

Intersection Layout

Comparing DQN and PPO

We compare two Deep Reinforcement Learning algorithms: DQN and PPO.

DQN (left) vs. PPO (right) traffic light control simulations.

Observations:
PPO achieves a shorter average delay much faster than DQN. Over time, however, both algorithms ultimately converge to similar average delay values by the end of training.

Average delay evolution during training.