Usually, reinforcement learning (RL) algorithms have a difficulty to learn the optimal control policy as the dimensionality of the state (and action) becomes large, because of the explosive increase in the search space to optimize. To avoid such an unfavorable explosive increase, in this study, we propose BASLEM algorithm (Blind Action Sequence Learning with EM algorithm) which acquires a state-independent and time-dependent control policy starting from a certain fixed initial state. Numerical simulation to control a non-holonomic system shows that RL of state-independent and time-dependent policies attain great improvement in efficiency over the existing RL algorithm.
In this paper, we propose a systematic method for the efficient tuning of the performance index in Nonlinear Model Predictive Control (NMPC) of parameter-dependent systems. The quadratic cost function in NMPC is tuned by applying the inverse optimality conditions on the linear quadratic regulator designed for the linearized model using the Inverse Linear Quadratic (ILQ) regulator design method. This approach provides some tuning parameters that give a trade-offbetween the speed of the system’s response and the magnitude of the control input. We propose two systematic methods for the selection of parameter-dependent tuning parameter. This approach is applied to the speed control of mean-value model of Spark Ignition (SI) engines. Effectiveness of the proposed methods is elaborated in simulation results.
A delta operator model is usually applied to identification problems with short sampling periods. To identify the system, we have to compute the difference of the input and output data in discrete time and design a noise filter. In this paper, we propose the delta operator model with the observable canonical form. By using this model, we can easily estimate the parameters of the delta operator model without the difference of the input and output data. We derive the Cramer Rao inequality, which evaluates the performance for estimating parameters in the proposed model. Finally, we show the results of the identification experiments for an inertia rotor pendulum and a nonminimum phase system.