IEICE Transactions on Communications
Online ISSN : 1745-1345
Print ISSN : 0916-8516

This article has now been updated. Please use the final version.

Penalized and Decentralized Contextual Bandit Learning for WLAN Channel Allocation with Contention-Driven Feature Extraction
Kota YAMASHITAShotaro KAMIYAKoji YAMAMOTOYusuke KODATakayuki NISHIOMasahiro MORIKURA
Author information
JOURNAL RESTRICTED ACCESS Advance online publication

Article ID: 2021EBP3197

Details
Abstract

In this study, a contextual multi-armed bandit (CMAB)-based decentralized channel exploration framework disentangling a channel utility function (i.e., reward) with respect to contending neighboring access points (APs) is proposed. The proposed framework enables APs to evaluate observed rewards compositionally for contending APs, allowing both robustness against reward fluctuation due to neighboring APs' varying channels and assessment of even unexplored channels. To realize this framework, we propose contention-driven feature extraction (CDFE), which extracts the adjacency relation among APs under contention and forms the basis for expressing reward functions in disentangled form, that is, a linear combination of parameters associated with neighboring APs under contention). This allows the CMAB to be leveraged with a joint linear upper confidence bound (JLinUCB) exploration and to delve into the effectiveness of the proposed framework. Moreover, we address the problem of non-convergence—the channel exploration cycle—by proposing a penalized JLinUCB (P-JLinUCB) based on the key idea of introducing a discount parameter to the reward for exploiting a different channel before and after the learning round. Numerical evaluations confirm that the proposed method allows APs to assess the channel quality robustly against reward fluctuations by CDFE and achieves better convergence properties by P-JLinUCB.

Content from these authors
© 2022 The Institute of Electronics, Information and Communication Engineers
feedback
Top