In order to avoid overfitting when dealing with complex data with a pattern recognition model, it is necessary to remove in advance the extraordinary values that deviate significantly from the true pattern of the population. In this study, the sample space was divided into a highly versatile space and a low versatility space by using the globally optimal decision tree. Then, the space with a low evaluation value was defined as the space with relatively large noise, and the pattern recognition model was created except for the data that belongs to the space with relatively large noise.
It was found that the pattern recognition model constructed in this way can obtain the prediction accuracy higher than that of the conventional method.
In particular, when the prediction accuracy of the model was confirmed by the walk-forward method using financial time-series data under the same conditions as actual fund management, investment performance that stably exceeded the return of benchmark assets was obtained over the past 20 years.
A space containing noise or a space in which a pattern is not easily recognized is not necessarily a subset of the entire sample set divided into two on one side. Therefore, when dividing a space by a decision tree, it is desirable to subdivide the space by a multiway tree. In this study, it was confirmed that the prediction accuracy improved when the binary tree was changed to the ternary tree.
View full abstract