![](https://crypto4nerd.com/wp-content/uploads/2024/04/0sx35jBJ5PzKmdH2l-1024x576.jpg)
In this notebook, we explored the development of artificial intelligence models capable of playing various Atari games using OpenAI Gym environments and stable baselines3 library. We followed a structured approach, starting with the Cart Pole game as an introductory example.
Steps:
- Building model that plays Cart Pole game
- Building model that plays LunaLander game
- Building model that plays CarRacing game
- Conclusion
First of all we need to install required libraries:
%pip3 install atari_py
%pip3 install Box2D
%pip3 install box2d-py
%pip3 install "stable-baselines3[extra]>=2.0.0a4"
%pip3 install gym
%pip3 install opencv-python
%sudo apt-get install -y swig build-essential python3-dev
%sudo apt-get install -y swig build-essential python3-dev
%pip install gymnasium[box2d]
%pip install box2d-py
%pip install box2d-py-manylinux1
%pip install git+https://github.com/pybox2d/pybox2d
For building model, first we need to make game environment. We will use OpenAI Gym environments and stable baselines3’s environments.
Next step creating model and train it. Lemme should how it will be build.
class AtariGames:
def __init__(self, env_path, render_mode, path_save_best, path_save, video_path):
self.env_path = env_path
self.render_mode = render_mode
self.path_save_best = path_save_best
self.path_save = path_save
self.video_path = video_pathdef make_env(self, which_env):
self.which_env = which_env
env = gym.make(self.env_path[self.which_env], render_mode=self.render_mode)
self.env = DummyVecEnv([lambda: env])
def train_model(self, total_timesteps, episodes, policy_kwargs, reward_threshold, eval_freq):
stop_callback_func = StopTrainingOnRewardThreshold(reward_threshold=reward_threshold, verbose=1)
eval_callback = EvalCallback(self.env,
callback_on_new_best=stop_callback_func,
eval_freq=eval_freq,
best_model_save_path=self.path_save_best[0],
verbose=1)
self.model = PPO("MlpPolicy", self.env, verbose=1, policy_kwargs=policy_kwargs)
self.model.learn(total_timesteps=total_timesteps, callback=eval_callback)
evaluate_policy(self.model, self.env, n_eval_episodes=episodes, render=True)
self.model.save(self.path_save[self.which_env])
del self.model
def test_model(self, episodes, fps):
model = PPO.load(self.path_save[self.which_env], env=self.env)
observation= self.env.reset()
frame_size = (self.env.render().shape[1], self.env.render().shape[0])
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
video_writer = cv2.VideoWriter(self.video_path[self.which_env], fourcc, fps, frame_size)
for i in range(1, episodes+1):
observation = self.env.reset()
score = 0
done = False
while not done:
frame = self.env.render()
action, _ = model.predict(observation)
observation, reward, done, _ = self.env.step(action)
frame_bgr = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
video_writer.write(frame_bgr)
score+=reward
print(f"Episode: {i}, score: {score}")
video_writer.release()
So we did our first model. It’s time to testing. For testing the model we should load the saved model.
env = ('CartPole-v1', "LunarLander-v2", "CarRacing-v2")# Render mode
render_mode = "rgb_array"
# Path to the best reward
path_save_best = ("CartPole-v1-BEST", "LunarLander-v2-BEST", "CarRacing-v4")
# Path to model
path_save = ("cart_pole_model.zip", "lunar_landler.zip", "car_racing.zip")
# Path to Video
video_path = ("CartPole-video.mp4", "LunarLander-video.mp4", "CarRacing-video.mp4")
policy_kwargs = dict(pi=[64, 128, 128, 64], vf=[64, 128, 128, 64])
act_fn=th.nn.ReLU
custom_policy_kwargs = dict(net_arch=policy_kwargs, activation_fn=act_fn)
games = AtariGames(env, render_mode, path_save_best, path_save, video_path)
# CartPole game
games.make_env(0)
games.train_model(total_timesteps=100000, episodes=10, policy_kwargs=custom_policy_kwargs, reward_threshold=500, eval_freq= 18000)
games.test_model(episodes=10, fps=35)
# LunarLander game
games.make_env(1)
games.train_model(total_timesteps=100000, episodes=10, policy_kwargs=custom_policy_kwargs, reward_threshold=300, eval_freq= 18000)
games.test_model(episodes=10, fps=35)
# CarRacing game
games.make_env(2)
games.train_model(total_timesteps=100000, episodes=10, policy_kwargs=custom_policy_kwargs, reward_threshold=500, eval_freq= 18000)
games.test_model(episodes=10, fps=35)
In summary, this document offers a hands-on manual for constructing AI models designed to excel at Atari games, demonstrating the effectiveness of OpenAI Gym and stable baselines3 libraries in training and assessing reinforcement learning agents. Through continued experimentation and improvement, these models have the potential to be fine-tuned for enhanced performance, enabling them to attain higher scores and confront increasingly intricate gaming scenarios.