Home > Research > Publications & Outputs > Monte-Carlo Based Online planning Under Partial...

Electronic data

  • 2024matheusphd

    Final published version, 5.27 MB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License

Text available via DOI:

View graph of relations

Monte-Carlo Based Online planning Under Partial Observability: Solving Single and Multi-Agent Problems

Research output: ThesisDoctoral Thesis

Published

Standard

Harvard

APA

Vancouver

Author

Bibtex

@phdthesis{a60b62c808e5451bbe078ab07c90219c,
title = "Monte-Carlo Based Online planning Under Partial Observability: Solving Single and Multi-Agent Problems",
abstract = "This thesis thoroughly explores the integration of statistical and reinforcement learning techniques, aiming to provide fresh perspectives and solutions for enhancing the current state-of-the-art methods considering the capabilities of autonomous agents to perform learning, planning and estimation in an online manner in a single and multi-agent systems context. We aim to address a critical demand in the field, steering away from the prevailing dependence on the application of intensive computational resources and large amounts of data as a requirement to achieve peak performance in our context. Our primary focus centres on studying and refining solutions in the ``online planning under uncertainty'' research area. We have ventured beyond the boundaries of existing literature, pushing our proposals to more complex and challenging problems. As concrete contributions, we introduce three new algorithms: IB-POMCP, an online planning algorithm which uses information entropy to augment a single agent's decision making capabilities; OEATE, a type and parameter estimation method to handle coordination with multiple unknown teammates in cooperative environments; and BAE, a method capable of detecting adverserial agents disguised as teammates in cooperative environments on-the-fly. Our proposals contribute to the evolution of autonomous systems and are supported by empirical and theoretical results. We demonstrate that our new perspectives for agents' reasoning processes can present generic and extendable solutions to diverse scenarios and problems. Finally, during the PhD journey, we have developed and presented to the research community a new framework designed to aggregate relevant baselines and benchmarks for multi-agent systems: the AdLeap-MAS. AdLeap-MAS framework stands out as a novel tool centred on the implementation and simulation of ad-hoc reasoning domains for multi-agent, collaborative, and adversarial contexts. The framework aims to facilitate the execution of experiments and the re-use existing codes across different environments. We provide a user-friendly environment that not only extends the frontiers of our research but also serves as a valuable resource for the research community.",
keywords = "Planning under uncertainty, Multi-Agent Systems, Information-guided planning, Adversarial Detection, Ad-hoc Teamwork, Online Planning, Estimation methods",
author = "{do Carmo Alves}, {Matheus Aparecido}",
year = "2024",
month = may,
day = "31",
doi = "10.17635/lancaster/thesis/2422",
language = "English",
publisher = "Lancaster University",
school = "Lancaster University",

}

RIS

TY - BOOK

T1 - Monte-Carlo Based Online planning Under Partial Observability

T2 - Solving Single and Multi-Agent Problems

AU - do Carmo Alves, Matheus Aparecido

PY - 2024/5/31

Y1 - 2024/5/31

N2 - This thesis thoroughly explores the integration of statistical and reinforcement learning techniques, aiming to provide fresh perspectives and solutions for enhancing the current state-of-the-art methods considering the capabilities of autonomous agents to perform learning, planning and estimation in an online manner in a single and multi-agent systems context. We aim to address a critical demand in the field, steering away from the prevailing dependence on the application of intensive computational resources and large amounts of data as a requirement to achieve peak performance in our context. Our primary focus centres on studying and refining solutions in the ``online planning under uncertainty'' research area. We have ventured beyond the boundaries of existing literature, pushing our proposals to more complex and challenging problems. As concrete contributions, we introduce three new algorithms: IB-POMCP, an online planning algorithm which uses information entropy to augment a single agent's decision making capabilities; OEATE, a type and parameter estimation method to handle coordination with multiple unknown teammates in cooperative environments; and BAE, a method capable of detecting adverserial agents disguised as teammates in cooperative environments on-the-fly. Our proposals contribute to the evolution of autonomous systems and are supported by empirical and theoretical results. We demonstrate that our new perspectives for agents' reasoning processes can present generic and extendable solutions to diverse scenarios and problems. Finally, during the PhD journey, we have developed and presented to the research community a new framework designed to aggregate relevant baselines and benchmarks for multi-agent systems: the AdLeap-MAS. AdLeap-MAS framework stands out as a novel tool centred on the implementation and simulation of ad-hoc reasoning domains for multi-agent, collaborative, and adversarial contexts. The framework aims to facilitate the execution of experiments and the re-use existing codes across different environments. We provide a user-friendly environment that not only extends the frontiers of our research but also serves as a valuable resource for the research community.

AB - This thesis thoroughly explores the integration of statistical and reinforcement learning techniques, aiming to provide fresh perspectives and solutions for enhancing the current state-of-the-art methods considering the capabilities of autonomous agents to perform learning, planning and estimation in an online manner in a single and multi-agent systems context. We aim to address a critical demand in the field, steering away from the prevailing dependence on the application of intensive computational resources and large amounts of data as a requirement to achieve peak performance in our context. Our primary focus centres on studying and refining solutions in the ``online planning under uncertainty'' research area. We have ventured beyond the boundaries of existing literature, pushing our proposals to more complex and challenging problems. As concrete contributions, we introduce three new algorithms: IB-POMCP, an online planning algorithm which uses information entropy to augment a single agent's decision making capabilities; OEATE, a type and parameter estimation method to handle coordination with multiple unknown teammates in cooperative environments; and BAE, a method capable of detecting adverserial agents disguised as teammates in cooperative environments on-the-fly. Our proposals contribute to the evolution of autonomous systems and are supported by empirical and theoretical results. We demonstrate that our new perspectives for agents' reasoning processes can present generic and extendable solutions to diverse scenarios and problems. Finally, during the PhD journey, we have developed and presented to the research community a new framework designed to aggregate relevant baselines and benchmarks for multi-agent systems: the AdLeap-MAS. AdLeap-MAS framework stands out as a novel tool centred on the implementation and simulation of ad-hoc reasoning domains for multi-agent, collaborative, and adversarial contexts. The framework aims to facilitate the execution of experiments and the re-use existing codes across different environments. We provide a user-friendly environment that not only extends the frontiers of our research but also serves as a valuable resource for the research community.

KW - Planning under uncertainty

KW - Multi-Agent Systems

KW - Information-guided planning

KW - Adversarial Detection

KW - Ad-hoc Teamwork

KW - Online Planning

KW - Estimation methods

U2 - 10.17635/lancaster/thesis/2422

DO - 10.17635/lancaster/thesis/2422

M3 - Doctoral Thesis

PB - Lancaster University

ER -