Home > Research > Publications & Outputs > Partially Observable Monte Carlo Planning in Pu...

Electronic data

  • 2022GaoMRes

    Final published version, 20.8 MB, PDF document

    Available under license: CC BY-SA: Creative Commons Attribution-ShareAlike 4.0 International License

Text available via DOI:

View graph of relations

Partially Observable Monte Carlo Planning in Public Game Tree for complex imperfect information game King UP

Research output: ThesisMaster's Thesis

Published
  • Gao Peng
Close
Publication date2022
Number of pages54
QualificationMasters by Research
Awarding Institution
Supervisors/Advisors
Publisher
  • Lancaster University
<mark>Original language</mark>English

Abstract

In the past, Game AI has been a challenging area where humans devote much effort to it. The game could be classified by whether players could fully observe the state of the game into two different game types: perfect information game and imperfect information game. While perfect information game has been studied well, like Go and chess, the research on imperfect-information games is limited to poker and dice. The board game King Up is a sort of imperfect information game with more complicated rules and attributes, such as partial observation, multiple players, partly shared goals, mixed strategy (cooperate and compete temporarily for goals), and a mixture of dynamic game and static game (vote and move). Until now, King UP has not been solved by existing methods.

Extended from Partially Observable Monte Carlo Planning, we propose a novel
algorithm called POMCP in a Public Game Tree to solve the problem, in which
we also combine the techniques from CFR with POMCP. The public game tree is
a data structure used to store belief over partial observable information and could be applied in most imperfect information games with multiple players; inspired by CFR, we propose local search and decision estimation processes to compute the value function over correlated state and action; we also implement two decision estimation methods, deterministic policy by Upper Confidence Bound and stochastic policy by Regret Matching.

In the experiment, we evaluate the convergency situation of the algorithm. Also, we evaluate the algorithm’s performance by further experiments like tournaments between different decision estimations and verify the generalization by Rock-PaperScissors.

At last, we rethink the motivation, insights, and contributions. Our algorithm could work with less prior knowledge and perform well in complex, imperfectinformation games with multiple players like King UP and mixture game types (dynamic and static game).