Penalized and Decentralized Contextual Bandit Learning for WLAN Channel Allocation with Contention-Driven Feature Extraction

Kota YAMASHITA; Shotaro KAMIYA; Koji YAMAMOTO; Yusuke KODA; Takayuki NISHIO; Masahiro MORIKURA

doi:10.1587/transcom.2021EBP3197

Abstract

In this study, a contextual multi-armed bandit (CMAB)-based decentralized channel exploration framework disentangling a channel utility function (i.e., reward) with respect to contending neighboring access points (APs) is proposed. The proposed framework enables APs to evaluate observed rewards compositionally for contending APs, allowing both robustness against reward fluctuation due to neighboring APs' varying channels and assessment of even unexplored channels. To realize this framework, we propose contention-driven feature extraction (CDFE), which extracts the adjacency relation among APs under contention and forms the basis for expressing reward functions in disentangled form, that is, a linear combination of parameters associated with neighboring APs under contention). This allows the CMAB to be leveraged with a joint linear upper confidence bound (JLinUCB) exploration and to delve into the effectiveness of the proposed framework. Moreover, we address the problem of non-convergence—the channel exploration cycle—by proposing a penalized JLinUCB (P-JLinUCB) based on the key idea of introducing a discount parameter to the reward for exploiting a different channel before and after the learning round. Numerical evaluations confirm that the proposed method allows APs to assess the channel quality robustly against reward fluctuations by CDFE and achieves better convergence properties by P-JLinUCB.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!