Manuscript received July 12, 2024; revised August 3, 2024; accepted September 5, 2024; published February 25, 2025
Abstract—This paper introduces SHA-ZA (Strategic
Heuristic Agent with Zero-human Advancement), an advanced
reinforcement learning agent trained to master the game of
Othello, drawing inspiration from DeepMind's AlphaZero,
which achieved exceptional proficiency in chess, shogi, and Go
through self-play and reinforcement learning. SHA-ZA
employs similar methodologies, utilizing self-play with
multiprocessing and Proximal Policy Optimization (PPO) to
achieve superior performance without prior human knowledge.
Trained on the equivalent of over 650 years of continuous
human experience, totaling 33,587,200 games, SHA-ZA
underwent rigorous testing against diverse opponents, resulting
in significant strategic gameplay advancements. The findings
illustrate SHA-ZA's ability to surpass advanced-level minimax
engines, highlighting the effectiveness of combining PPO and
self-play for mastering complex board games like Othello.
Keywords—reinforcement learning, Othello, AlphaZero,
self-play, Proximal Policy Optimization (PPO), board games,
artificial intelligence
Cite: Mohammed Yousif, "SHA-ZA: Advanced Reinforcement Learning for Othello Mastery Using Proximal Policy Optimization," International Journal of Machine Learning vol. 15, no. 1, pp. 17-22, 2025.
Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).