The Power of Deep Reinforcement Learning

Maximising the potential of next-generation technologies with a better-than-human AI approach

When IBM’s “Deep Blue” beat Garry Kasparov in the mid 90’s, it was the first time in history that a computer had defeated the World chess champion. Then in 2016, Google’s DeepMind AlphaGo defeated Lee Sedol, one of the best players of the strategy board game, Go, as well as a host of other Go world champions over the years. Since then, deep reinforcement learning (DRL)—a new AI approach that enabled AlphaGo—has brought new and refreshed enthusiasm to the world of Artificial Intelligence. This enthusiasm has spurred much research in creating smarter environments to build sustainable cities and improve society’s well-being.

Researchers in both academia and industry have been racing to apply DRL in various ways, including self-driving cars and industry automation. They aim to make traditional systems smart and the smart ones even smarter, perhaps achieving a better-than-human intelligence as demonstrated by AlphaGo.

DRL integrates deep learning, which uses layers of artificial neurons that mimic the brain structure, with reinforcement learning. The latter enables a learning agent to explore and exploit the best possible actions autonomously in a dynamic operating environment (or state). The agent then would achieve the highest possible rewards (i.e., make the best decisions) in enhancing system performance over time as well as learning knowledge, which comprises appropriate actions under different states, on the fly and unsupervised. Compared to reinforcement learning, DRL uses a deep neural network to represent complex sets of states and has been shown to achieve breakthrough performance with lower computational cost, reduced learning time, and more efficient knowledge storage.

Capitalising on the advantages of DRL, Sunway University’s Professor Yau Kok Lim and his team of researchers from the Department of Computing and Information Systems are gearing towards the use of reinforcement learning and DRL in enhancing smart transportation and communication systems, that are fast-paced, dynamic, heterogeneous, complex, and data-intensive in nature.

Traffic congestion, for example, is inevitable in most urban areas. In Malaysia, unpredictable weather compounds the issue as heavy rain and wet roads will slow traffic, especially during rush hour or at night. Congestion at a single intersection has domino and singlepoint-of-failure effects that could disrupt the traffic at neighbouring roads.

The research looks keenly at the intersections where traffic bottlenecks are known to occur despite being monitored by traffic lights. Using DRL, traffic light controllers were enabled at different intersections to collaborate and exchange knowledge in selecting their traffic phases and split phasing. This would allow a green wave and mitigate cross blocking and vehicle idling.

This novel approach was applied to the traffic lights in Sunway City, considering the irregular traffic caused by heavy rainfall. The results showed reduced queue length and waiting time of vehicles and fewer number of vehicles crossing an intersection.

In terms of communication systems, the technology is already moving towards 5G wireless mobile networks and cognitive radios. Wireless applications are also growing, particularly multimedia-based ones as well as internet services for mobile gadgets and devices.

There is increasing need for more wireless bandwidth and the radio spectrum that offers it. This increase has led to spectrum scarcity, which in Malaysia, is further complicated as radio spectrum is shared with neighbouring countries such as Indonesia and Singapore.

Applying reinforcement learning and DRL, the research looks at enabling mobile gadgets and devices to learn knowledge and adopt the best possible actions for various network operations.

Both approaches provide intelligence and autonomy to support core operations, from accessing underutilised radio spectrum to routing and enhancing security. For example, a wireless host searches for a multi-hop route to its destination node in a dynamic environment in which network conditions, such as licensed and unlicensed network traffic, change over time.

The research findings have been published in journals and conference proceedings, and resulted in a patent registration. Yau’s research would provide the impetus for improvement in next generation technologies leading to smarter and more sustainable development.

Professor Yau Kok Lim
Department of Computing and Information Systems


This article has been adapted from the original feature in Spotlight on Research (Volume 5).