Anti-Intelligent UAV Jamming Strategy via Deep Q-Networks

Associated organisational units

Electronic data

icc_final
Rights statement: ©2019 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Accepted author manuscript, 220 KB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Text available via DOI:

https://doi.org/10.1109/ICC.2019.8762016
Final published version

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

More...

Publication date	20/05/2019
Host publication	2019 IEEE International Conference on Communications, ICC 2019 - Proceedings
Publisher	IEEE
Number of pages	6
ISBN (electronic)	9781538680889
ISBN (print)	9781538680896
<mark>Original language</mark>	English

Publication series

Name	IEEE International Conference on Communications
Volume	2019-May
ISSN (Print)	1550-3607

Abstract

The downlink communications are vulnerable to intelligent unmanned aerial vehicle (UAV) jamming attack which can learn the optimal attack strategy in complex communication environments. In this paper, we propose an anti-intelligent UAV jamming strategy, in which the mobile users can learn the optimal defense strategy to prevent jamming. Specifically, the UAV jammer acts as a leader and the users act as followers. The problem is formulated as a stackelberg dynamic game, which includes the leader sub-game and the followers sub-game. As the UAV jammer is only aware of the incomplete channel state information (CSI) of the users, we model the leader sub-game as a partially observable Markov decision process (POMDP). The optimal jamming trajectory is obtained via deep recurrent Q-networks (DRQN) in the three-dimension space. For the followers sub-game, we use the Markov decision process (MDP) to model it. Then the optimal communication trajectory can be learned via deep Q-networks (DQN) in the two-dimension space. We prove the existence of the stackelberg equilibrium. The simulations show that the proposed strategy outperforms the benchmark strategies.

Bibliographic note

©2019 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Research

Associated organisational units

Electronic data

Links

Text available via DOI: