Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
39th (2025)
Session ID : 1S5-GS-2-04
Conference information

Efficient and Low Bias Policy Gradient Estimation in Contact Rich Differentiable Simulation
*Ku ONODAPaavo PARMASYutaka MATSUO
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

In policy gradient reinforcement learning, access to a differentiable model enables 1st-order gradient estimation that accelerates learning compared to relying solely on derivative-free 0th-order estimators. However, discontinuous dynamics cause bias and undermine the effectiveness of 1st-order estimators. Prior work addressed this bias by constructing a confidence interval around the REINFORCE 0th-order gradient estimator and using these bounds to detect discontinuities. However, the REINFORCE estimator is notoriously noisy, and we find that this method requires task-specific hyperparameter tuning and has low sample efficiency. We propose a novel method, Discontinuity Detection Composite Gradient (DDCG), which dynamically switches its gradient estimator by a statistical test for discontinuities based on smoothness assumptions. We evaluate our method on differentiable simulation control tasks and find that our method performs well even with a fixed hyperparameter and has effective gradient estimation even in the small sample regime.

Content from these authors
© 2025 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top