2020 Volume 2020 Issue FIN-024 Pages 160-
In stock investment and asset management, numerical information has been mainly used for fundamentals analysis and technical analysis, but recently text information such as news released in real time can be also used by development of natural language processing techniques. Although machine learning is mainly used for investment decisions, because text information is composed of a large number of words, its bag-of-words tend to be a sparse and high-dimensional vector. Therefore, the curse of dimensionality causes a negative effect on the learning performance of machine learning. For this reason, we only focus on news headlines to restrict the number of keywords as text information. In addition, we extracted important keywords that increase the volatility of stock prices right after the news appears, and applied machine learning techniques to learn the relationship between the combination of important keywords included in a news headline and tomorrow's active return of the company most related to the news. Through some statistical tests, we could confirm the validity of focusing on the stock volatility to extract important keywords and their combination is useful for active investment management with machine learning approach.