![](https://crypto4nerd.com/wp-content/uploads/2023/05/0lAdp_6pT9w4YDhHl.png)
by Ram Nayak
Have you ever come across the term Ingenious gaming? You must have seen few videos where robots are cleaning rooms, picking up objects and basically doing a whole range of mundane tasks that robots in sci-fi movies can do without question. But they are often remote controlled because earlier we could not embed them with the needed intelligence but surely that’s not the problem because of reinforcement learning.
Reinforcement learning is an intriguing approach for game developers because it allows them to create intelligent and adaptive game agents through trial and error.
In this article, we’ll look at the exciting world of reinforcement learning in gaming and how it’s being used to make smarter, more engaging games.
Reinforcement Learning (RL) is a type of machine learning, in which an agent explores an environment to learn how to perform desired tasks by taking actions with good outcomes and avoiding actions with bad outcomes.
Game agents can be trained to make decisions and take actions in response to the rewards and penalties they receive in the game environment.
RL is based on models called (MDPs).
An MDP (Markov Decision Processes) consists of a series of time steps. Each time step consists of the following:
· Environment: The outside world with which the agent interacts
· State: Current situation of the agent
· Reward: Numerical feedback signal from the environment
· Action: What the agent does. For example, the robot takes a step forward.
· Episode: An episode is made up of all of the time steps in an MDP from the initial state to the terminal state
Let’s take a simple example of the Ping-Pong game:-
The framework and RL are strikingly similar to the standard supervised learning framework. So we still have an input frame, we run it through neural network model, and the network produces an output action (up or down), but we don’t know target label, i.e. whether to go up or down, because we don’t have a data set to train on.
The policy network is the network in RL that transforms input frames into output actions. The policy gradients is a method to train policy network.
The approach is to start with a completely random network, feed that network a frame from the game engine, and it will produce a random action: up or down, which you will send back to the game engine, and the loop will continue.
-The output of your network will be a probability of actions (up and down).
– During training, we sample the distribution so that the same exact actions are not repeated. This allows the agent to explore the environment at random in order to obtain better rewards.
– If it scores, it receives a +1 point, policy gradients will increase the likelihood of those actions in the future.
– If its opponent scores, it receives a -1 point, the same gradient multiplies it with minus one and this minus sign ensures that in the future all the actions that we took in a bad episode are going to be less likely in the future.
– The agent may lose many games, but it occasionally strikes gold, randomly selecting a series of actions that result in a score of +1.
Now that you have understood what RL is let’s see another application in different environment:
Another example is an AI that learns to escape:
Meet Albert, an agent in the environment who has a time limit to escape the room. Albert has the ability to move, turn, and jump.
He has a brain with a 5 layer neural network.
He can only detect what is in its Raycast vision, such as targets, obstacles, walls, and the pit (ground).
When more observation is required to clear a level, the number of Raycasts is increased accordingly.
It learns how to get to the green tiles known as the pleasure plates. Its main goal is to get out of the room.
This game has seven rooms or levels, and as the agent progresses through them, the difficulty increases. The agent is rewarded with points for taking the correct path, and if the agent does not take the correct path, the agent is penalized.
Albert’s actions are analyzed and the weights in his neural network are adjusted using PPO (proximal policy optimization) to prioritize positive outcomes and avoid negative ones. He starts by making random decisions but accidentally hits a pressure plate in the first room and is rewarded.
Simpler version of this example was given by Microsoft where they trained an agent to navigate a lava maze in Minecraft using Azure Machine Learning.
The agent’s objective is to navigate a maze and reach the blue exit tile by walking on solid tiles. Falling into lava results in starting over. The agent must learn to generalize to deal with random maze maps.
The OpenAI Five implementation entailed utilizing existing reinforcement learning techniques to train an artificial intelligence system to play the complex multiplayer online battle arena game Dota 2. The system was trained for 10 months using a distributed training system and tools for continuous training, allowing it to learn from batches of approximately 2 million frames every 2 seconds.
Through self-play reinforcement learning, OpenAI Five was able to achieve superhuman performance on a difficult task, eventually defeating the Dota 2 world champion (Team OG) in a best-of-three match. This success demonstrated the potential of reinforcement learning and self-play as tools for training AI systems to perform complex tasks with high levels of skill and adaptability.
Reinforcement learning (RL) has already demonstrated great promise in gaming, with the best computer players in a variety of games employing RL, We can expect even more exciting developments in the future as game developers continue to collaborate with researchers to improve AI in gaming. Revenue in the Games segment is projected to reach US$396.20bn in 2023 according to statista.
The potential for RL in gaming is enormous, with such a large and growing audience. The gaming industry is expected to continue its upward trend, as shown in the graph above, making it an exciting area for future research and development.
What if you could select Your Surgeon by His Video Game Experience?