The statistical arbitrage strategy is one of the most traditional investment strategies. Many theoretical and empirical studies have been conducted for a long time. Almost all of the statistical arbitrage strategies focus on the price difference (spread) between two similar assets in same asset class and exploit the mean reversion of spreads, i.e., pairs trading. In this study, we extend the strategy to multiple assets in the multi-asset market. Concretely, we derive a mean-reverting portfolio with time series model. Finally, we perform an empirical analysis in multi-asset market and show the effectiveness of our strategy.
Recently, many researchers have studied stock trading using technical analysis. However, it is necessary to have deep knowledge to use such technical analysis and it is difficult to achieve profitability using this technique. Therefore, using Genetic Programming, we construct an evolutionary model that considers the technical index for devising a profitable trading strategy. Finally, we confirmed the effectiveness of our model using historical data of the stock market.
This paper examines the possibility of combining a DSGE model and neural networks to supplement each other, with regard to out-of-sample forecasts for economic variables. The aim is to build a model with theoretical interpretability and state-of-the-art performance. The novel neural-net structure of TDVAE (Temporal Difference Variational Auto-Encoder) proposed by Gregor et.al  enables to realize this idea. TDVAE virtually replicates a Gaussian stochastic state-space model through combination of neural networks. Because a DSGE model provides theoretical restrictions on the state transition and observation matrices of a linear state-space model, I choose to transplant those DSGE-oriented matrices into the formulations of state transition and observation probabilities in TDVAE. This TDVAE-DSGE approach certainly achieved the superior performance in the task of out-of-sample forecasts on Japan's real GDP during 1Q/2011 and 4Q/2018.
In this paper, we present low-dimensional embedding methods for interbank transaction networks. To address one important problem: how to obtain latent representations that well capture the structual properties of a given directed network, we propose a new network embedding model, Co-Variational Autoencoder (Co-VAE). Co-VAE simultaneously learns network embedding focusing on the links going into each node and that focusing on the links coming out of each node, attempting to reproduce the original adjacency matrix. Thereby, we can learn the Co-VAE network embedding model, simultaneously capturing both the latent representations of lender patterns and those borrower patterns. Using both latent representations, we can predict interest rates of interbank transactions.
In this paper, we introduce several types of text classification model that predict the speakers on BOJ's "Summary of Opinions". Those documents summarize BOJ's Monetary Policy Meetings and the major part of the contents consist of the committee members' opinions, but those comments are kept anonymous. Regarding this issue, the commentator prediction model should be a great help for focusing the next decision of BOJ. Our models are trained with the past public speech texts of BOJ committees. In order to correct the bias of datasets, we tried some data pre-processing before model fitting. The proposal models are Random Forest, LSTM and Bidirectional LSTM with attention mechanism. As a result, we achieved over 90% accuracy for the best. To applying those models to the analysis of BOJ's "Summary of Opinions", we focus on the relationship between the ratio of presumed speakers in the documents and past monetary policy changes. The analysis revealed that the higher ratio of "Reflationist's" assertion raises the chance of monetary easing occurrence.
This paper reports our work on developing a new business sentiment index using daily newspaper articles. We adopts a recurrent neural network (RNN) to estimate the sentiment of a given text. The RNN is initially trained on Economy Watchers Survey and then fine-tuned on newspaper articles for domain adaptation. Also, a one-class support vector machine is applied to filter out texts of irrelevant topics. Moreover, we analyze the contributions of various factors that influence the estimated business sentiment.
The Economy Watcher Survey, which is a market survey published by the Japanese government, contains assessments of current and future economic conditions by people from various fields. Although this survey provides insights regarding economic policy for policymakers, a clear definition of the word "future" in future economic conditions is not provided. Hence, the assessments respondents provide in the survey are simply based on their interpretations of the meaning of "future." This motivated us to reveal the different interpretations of the future in their judgments of future economic conditions by applying weakly supervised learning and text mining. In our research, we separate the assessments of future economic conditions into economic conditions of the near and distant future using learning from positive and unlabeled data (PU learning). Because the dataset includes data from several periods, we devised new architecture to enable neural networks to conduct PU learning based on the idea of multi-task learning to efficiently learn a classifier. Our empirical analysis confirmed that the proposed method could separate the future economic conditions, and we interpreted the classification results to obtain intuitions for policymaking.
Analysts in investment trust management companies survey business achievements of target companies and are supposed to summarize them in the form of an analyst report. A fund manager reviews the reports and decides which companies to invest based on the reports as well as various economic indices and information. However, when searching for potential investment targets, a fund manager is generally required to read a large number of analyst reports and other related documents, Obviously, this work is not an easy task even for a skilled manager. In this work, we propose an intelligent system that retrieves meaningful sentences related to a specific query such as 'performance' and automatically evaluates a market trend in order to mitigate their work loads. From a total of 37,398 analyst reports and interview records, we obtained word embedding vectors using Word2Vec, and related sentences addressing company's financial soundness were retrieved based on the similarity to a query. In our experiments, for the word 'achievement', we retrieved 395 sentences out of 2,182 sentences that were 2.67 times larger than those when an exact search was applied. On the other hand, a trend of market sentiment obtained from a keyword such as 'profit' did not have high correlation against actual market indices.
Individual trading analysis has become the focus of recent attention in the field of market microstructure. Several research works reveal how trading strategies of specific traders or banks are affected by historical market information. However, there is little research revealing how such microscopic trading strategies recursively affect macroscopic future market information. Using the high-granular dataset including the trading behavior of specific banks in a U.S dollar (USD) against Japanese yen (JPY) market, here we demonstrate management method of positions, defined as the numbers of units of USD banks bought or sold against JPY, can be clearly clustered into two simple strategies. We then find the strong relationship between future-market price movements and these two position management strategies, and this relationship even allows a prior prediction of market prices fifteen minutes ahead.
This paper proposes a clusterling of investment trusts based on returns in order to classify mutual funds into actual funds style groups. In the conventional method, the investment trusts are clustered based on the similarity of the investees. However, since the investees cannot be confirmed by anyone other than the management company, they cannot be used by anyone other than the management company. Analyze the investment trust using the proposed method and confirm that the mutual fund styles is included in the similar investment trust or included in the cluster.
Selling options is a popular investment strategy, which regularly receives a premium and, on the other hand, takes variance risk, especially negative fat-tail risk. Therefore, it is important for risk-averse investors to mitigate these types of risks by constructing hedge position in consideration of transaction costs. Main results of this research are as follows: (1) In a practical simulation, DDPG model with utility based reward suggests a better way of dynamic hedging compared to simple benchmarks. (2) As a real-world application to market data, this learned model successfully manages the short straddle portfolio of treasury futures options.
This paper calculates the theoretical stock prices in the fundamentals theory with "The Japan Company Handbook", which covers all listed companies. Three machine learning methods are used with referring the developed the analytical methods in Kaggle.
In recent years, stock prices have been predicted in various forms such as technical analysis and fundamental analysis using time series data and financial indicators, and text analysis using news information. In particular, some use text mining to predict whether stock prices will rise or fall from text information, but useful text information does not appear frequently on all stocks. Considering the increase in algorithmic trading using technical analysis, analysis that relies solely on textual information is not appropriate because it does not take into account the impact of such trading. Therefore, in this study, first, stocks that are easily affected by technical analysis were ranked by using machine learning to raise and lower stock prices using indicators that are often used in technical analysis techniques. Moreover, when the analysis did not go well, we analyzed what kind of events occurred and investigated how technical analysis affects the stock price.
Recently, attention-based RNNs have been studied to represent multivariate temporal or spatio-temporal structure underlying multivariate time series. One recent study has achieved improved performance by employing attention structure that simultaneously capture the spatial relationships among multivariate time series and the temporal structure of those time series. That method assumes a single time-series sample of multivariate explanatory variables, and thus, no prediction method was designed for multiple time-series samples of multivariate explanatory variables. Moreover, such previous studies have not explored on financial time series incorporating macroeconomic time series, such as Gross Domestic Product (GDP) and stock market indexes, to our knowledge. Also, no neural network structure has been designed for focusing a specific industry. We aim in this paper to achieve effective forecasting of corporate financial time series from multiple time-series samples of multivariate explanatory variables. We propose a new industry specific model that appropriately captures corporate financial time series, incorporating the industry trends and macroeconomic time series as side information. We demonstrate the performance of our model through experiments with Japanese corporate financial time series in the task of predicting the return on assets (ROA) for each company.
The market is greatly disrupted by economic news and online media information such as twitter, causing distortions in the economic and financial time series of commodity futures markets. In this research, we applied the anomaly detection method by using of Dense Auto Encoder to economic and financial time series data to detect distortion of economic and financial time series such as commodity futures market.
In stock investment and asset management, numerical information has been mainly used for fundamentals analysis and technical analysis, but recently text information such as news released in real time can be also used by development of natural language processing techniques. Although machine learning is mainly used for investment decisions, because text information is composed of a large number of words, its bag-of-words tend to be a sparse and high-dimensional vector. Therefore, the curse of dimensionality causes a negative effect on the learning performance of machine learning. For this reason, we only focus on news headlines to restrict the number of keywords as text information. In addition, we extracted important keywords that increase the volatility of stock prices right after the news appears, and applied machine learning techniques to learn the relationship between the combination of important keywords included in a news headline and tomorrow's active return of the company most related to the news. Through some statistical tests, we could confirm the validity of focusing on the stock volatility to extract important keywords and their combination is useful for active investment management with machine learning approach.
In this research, we introduce a system for searching causal relationships of events related to economics and finance in a chain-like manner from a database extracted from economic text data. This system also introduces stock price analysis using this economic causal-chain search.
A lead-lag effect in stock markets describes the situation where one (leading) stock return is cross-correlated with another (lagging) stock return at later times. There are various methods for stock return forecasting based on such a lead-lag effect. One of the most representative methods is based on the supply chain network. In this research, we propose a stock return forecasting method with an economic causal chain. The economic causal chain refers to a cause and effect network structure constructed by extracting a description indicating a causal relationship from the texts of Japanese financial statement summaries. We examine the following lead-lag effect. (1) whether lead-lag effect spreads to the 'effect' stock group when there is a large stock uctuation in the 'cause' stock group in the causal chain. (2) whether lead-lag effect spreads to the 'cause' stock group when there is a large stock uctuation in the 'effect' stock group in the causal chain. We confirm the existence of the both side of lead-lag effect and the evidence of stock return predictability across causally linked firms in the Japanese stock market.
In this paper, we aim to predict stock prices by analyzing text data in financial articles. TopicVec is a topic embedding model that represents latent topics in a word embedding space. Here, word embedding maps words into a low-dimensional continuous embedding space by exploiting the local word collocation patterns in a small context window. On the other hand, topic modeling maps documents onto a low-dimensional topic space. Using the topic embedding model, topics underlying each document can be mapped into the word embedding space by combining word embedding and topic modeling. The topic embedding model has not been used to address regression problem and also has not been used to predict stock prices by analyzing financial articles, to our knowledge. In this paper, by extending the topic embedding model to regression, we propose a topic embedding regression model called TopicVec-Reg to jointly model each document and a continuous label associated with the document. Our method takes financial articles as documents, each of which is associated with a stock price return as a continuous label, so that we can predict stock price returns for new unlabeled financial articles. We evaluate the effectiveness of TopicVec-Reg through experiments in the task of stock return prediction using news articles provided by Thomson Reuters and stock prices by the Tokyo Stock Exchange. The result of closed test shows that our method brought meaningful improvement on prediction performance.
ESG investment is getting popular as an investing method that evaluates environmental, social and governance (ESG) efforts of each company. However, although the disclosure of information on ESG is already common, it is difficult to objectively evaluate ESG efforts of companies due to the ambiguity in the definition of ESG words: environment, society, and governance. For this reason, we applied Word2Vec to extract similar words to each ESG word, and evaluated ESG efforts of companies from the viewpoint of these extracted words. As a result, we confirmed that the companies with higher ESG score can make a more profitable portfolio than those with lower ESG score, and this ESG score can be useful for factor investing.
Analysis of transaction data between corporations has strong future growth potential in finance since financial institutions have the large amount of the data. However, since it is difficult for each institution to get such data except the main customers' one, the data is partial and the application would be also limited. We set a similar environment artificially from the only observed data by removing some observed transaction links, and evaluate if we could predict the removed links. We show that graph embedding could lead to a solution of this problem.
Although econophysics techniques have been widely applied to investigate stock markets, the analysis of non-trading or night periods has received less attention. Here, we investigate the correlation between overnight and daytime return (correlation ND) and the correlation between daytime return and following overnight return (correlation DF). A standard correlation analysis reveals a weak negative correlation between overnight and daytime return (correlation ND) in Japanese Stocks Market. To enhance this signal, we use the Volatility Constrained correlation (VC correlation) method, which led to a significant amplification of this observed tendency. This result has strong implications for increasing predictability of day time return compared to standard correlation. Moreover, the amplified tendency observed for each stock revealed a linear scale relationship between the standard correlation and VC correlation. Taking together, the application of VC methodology to financial trading data overnight enhances the observed correlations which may lead to improve market predictions.
The importance and necessity of investment in intangible asset value of companies, including ESG investment, has been increasing in recent years. Corporate brands are an important element of intangible assets. However, especially in B2B companies, there are many unclear points regarding the relationship between corporate brand value and business performance. In this study, we quantitatively evaluated the relationship between B2B corporate brands and performance using a new large-scale corporate brand survey index created from a business card exchange network. We examined the relationship between the corporate brand index and ROE for B2B companies. As a result, we found that companies with a high corporate brand index had a significantly higher profit margin, a significantly lower turnover rate and no significant association with financial leverage. These results show that even in B2B companies, companies with high corporate brands have implemented a differentiation strategy that allows a high margin range, while companies with low brands have adopted cost leadership strategies with high turnover rate.
Financial institutions have a large amount of data on money transfer among companies. With these data, they can construct graphs that represent the business relationships of the companies. If they can predict a rating of a company's repayment capacity from the graphs, they can make more appropriate loan decisions and find new customers. In this study, we propose a method applying extended Graph Convolutional Networks to business relationship graphs representing the continuity of transactions to predict the ratings. As a result of applying this method to actual data, it was indicated that this method automatically extracted features necessary for the prediction of the rating.
In this paper, we propose a method of retrieving expressions and constructing financial sentiment lexicons on stock, bond and exchange. Retrieved phrases are distinctive words in each financial market and extensions of these words. For example, the sentiment of the word "increase" in the financial sentiment lexicons cannot be determined by merely looking at the word itself. However, we can determine sentiment by considering the relation between preceding and subsequent words, such as "increased consumption" and "increase in costs". In addition, it is necessary that the expression such as "cause inflation" determines appropriate sentiment on stock, bond and exchange. We acquire automatically these expressions, and we construct appropriate financial sentiment lexicons by stock, bond and exchange.
In this study, we propose a method to extract market information of emerging markets from newspaper articles. Here, we use Bloomberg articles as an information source to extract market information of emerging markets where is expanding in recent years. The extracted market information is useful as reference information when creating a market analysis report for the target emerging market. However, there are many articles that describe market conditions in large markets such as the Nikkei Stock Average and the Dow Jones Industrial Average, but there are few articles that mention one emerging market, such as India, China and Taiwan. It is only described in one part of an article that mentions several emerging markets. Therefore, in this study, we extract market information about one target emerging market from articles where information about market conditions of multiple emerging markets is mixed in one article. Furthermore, we select important sentences from the sentences extracted using the topic model and create a monthly report.
We propose a scheme for selecting stocks related to a theme. This scheme was designed to support fund managers who are building themed mutual funds. Our scheme is a type of natural language processing method and based on words extracted according to their similarity to a theme using word2vec and our unique similarity based on co-occurrence in company information. We used data including investor relations and official websites as company information data. We also conducted several other experiments, including hyperparameter tuning, in our scheme.
In this paper, we proposed a method for automatically extracting sentences containing corporate performance factors from summaries of financial statements at high accuracy. More specifically, only sentences that can be determined to be corporate performance factors in the summaries of financial statements with high probability are extracted, and those sentences are used as training data. We trained the neural network by using the word representation of the training data. Furthermore, by using the appearance tendency of the corporate performance factor sentence specific to summaries of financial statements as a bias of model output, it became possible to extract with a higher f-measure than related work that performs filtering processing using corporate keywords.