The goal of this thesis is composed of two main parts: the filtering part that eliminates a majority of pairs, and the implementation of the trading strategy. Pair trading is known as market neutral strategy therefore we can neglect beta of the market and generate uncorrelated returns (alpha) with minimal exposure to the market, the identification of pairs and implementation are complex, and the cointegration relationship that allow to characterize the good pairs can break in a short period of time.
While pairs can be chosen either based on fundamentals or statistics, an approach combining both will be used. The goal is to have an automated procedure with a high accuracy to select for pairs that will generate a positive PnL on average. In this section we use machine learning algorithm “Density-Based Spatial Clustering of Applications with Noise” that will allow to build clusters of stocks that had similar changes in the past. Then, we will apply the test of the cointegration for each cluster found in order to determine the pairs that exhibit mean-reverting behavior. In a second part, a trading strategy as well as its trading rules will be implemented, Kalman Filter will be used as well in this part in order to determine the hedge ratio while trading the pairs. Finally, the results will be presented and directions for improvement in the Reinforcement Learning trading agent will be given
Data Management
Source of Data
Bloomberg has been the main source of the data used during this thesis. The historical data the index SXXP 600 between 2002 and 2019 has been used including all the components of the index all along this period. Moreover, Other information needed during the thesis were directly web scraped from YahooFinance or Reuters, in order to feed our algorithms by implementing fundamental and therefore create more robust clusters and stock pairs.
Cleaning of the Data
Before we start selecting the pairs to trade from the 600 stocks, we had to do some prepossessing on the stocks. First, we have to ensure that each stock respects a level of liquidity. The liquidity in the stocks is an essential factor because pair trading strategy involves short-selling. Only stocks with a daily trading volume above 100 000 per day were kept. Second, it is important that the company traded in the pair has a certain financial stability. As results only companies in a good financial health (lower probability to go bankrupt) were considered during the selection of the pairs based on fundamental ratios: ROIC, Debt to Equity.
Presentation of the final Database
The final data is composed from :
– Database containing the historical market date (Open, close, Low, High, Volume) of all the components of the index SXXP 600 between 2002 and 2019. • Database to map the components of SXXP 600 for each period;
– Database which a result of a web scraping and it contains the sector, Market Capitalization, P/E, ROIC, Debt to equity etc of each stock. This database has been built all along the thesis depends on the need and different tests that have been done on the algorithms that will be presented later.
Clustering
After selecting the final stocks candidates for the pairs trading we reduced the number of tests to be done from 180 000 to 44 000. In order to select the most profitable candidates for pairs-trading, the universe is clustered on each stock-by their daily return with Principal component analysis (PCA), Sector and market cap. PCA is a procedure that uses a transformation to convert a set of observations of possibly correlated attributes into a set of linearly uncorrelated attributes called principal components. PCA is efficient and accurate in reducing dimensions of the returns data while keeping the information.
Density-Based Spatial Clustering of Applications with Noise
Once the Principal Components have been extracted, we work with these components combined with some fundamental data to build clusters which contain stocks with similar behavior. Feeding our model with fundamental data make it more robust as it allows the algorithm to look behind the historical returns of stocks. In particularly, this will help the algorithm to cluster together stocks with same risk profiles, therefore the stocks in the same cluster tend to be acted by the same macro-economic factors. A technique developed by Ester et al. (1996) called Density-Based Spatial Clustering of Applications with Noise (DBSCAN) will used. Particularly, this method has been used instead of Kmean as it does not require a predefined number of clusters in advance. DBSCAN will identify dense regions based on the measurement of the distance between different element. The algorithm has two main parameters. The first parameter reflects the radius of the neighbors around a given data point. The second parameter represents the minimum number of points we want to have in our cluster. The algorithm is based around density reachability. Q is considered to be density-reachable by p if there is a propagation of p1,p2, . . . ,pn such that each point is density reachable from the previous one.
It is important to mention that the number of clusters found depend heavily on the inputs of our algorithms. It has been decided to take a large radius of the neighbors around a given data point, as our main goal is just to put together the stocks that have the same behavior in general. An additional test will be done in order to make sure that we choose the most profitable pairs. The clustering is very important as mentioned before. For instance, if we have 1000 stocks and we would like to run directly the cointegration test for these stocks, we will need to test all possible pairs combinations which is equal in this case to 499 500 (see Figure 1).
Visualization of high dimension with machine learning algorithm (tDistributed Stochastic Neighbor Embedding)
We need to visualize the results of the output of the previous algorithms. However, we are working in a highly dimension therefore we cannot represent the results easily. Therefore, we will be using an algorithm called (t-SNE) which is a machine learning algorithm that allows to reduce high-dimensional data in low-dimension for visualization. t-SNE, unlike CA, is not a linear projection. It uses the local relationships between points to create a low-dimensional mapping. This allows it to capture non-linear structure (see Figures 2 and 3).
Identifying Mean-Reversion
In the third stage, we seek to find mean-reversion among stocks in the clusters discovered in the previous stage. This will be identified by cointegration. The term mean reversion in finance is used to describe the stationary time series, noted I(0). It is a stochastic process with constant mean and variance, and time-independent autocorrelation. For instance, returns of stocks are typically stationary processes. A series that is non-stationary, and that is stationary only by differentiating it n times is called an ‘integrated of order n’, noted I(n) series. In the case of an efficient market, the logarithm of the price of a stock Pt=log(St) represents random markets which are an example of the integrated series of order 1.
Cointegration test
Application of the ADF to the clusters found
The Engle-Granger (EG) test is a two-steps residue-based test. To prove that x and y variables are cointegrated, we can proceed as follows: First, the residuals of the regression model of the variable y are calculated on a constant and on x, then a second regression is used for the differentiated residue on the lags of the residue without a constant. The test statistic used is the traditional delayed residue MCO. Under the null hypothesis of no co-integration of x and y, the residue process should be non-stationary. (Resemblance to the Dickey- Fuller test). The rejection of the null hypothesis leads to the conclusion of the stationarity of the residue process i.e., that the series are co-integrated.
In this thesis, we focused only on building pairs of stocks. We have applied the previous method to find the most profitable pairs. Bellow the pairs of stocks that passed the cointegration test between the period of 2012 and 2017. The cointegration test was done for each cluster found with DBSCAN algorithm. A pair of stocks has been considered as cointegrated if the P-value of the test is below 0.05. Below an output of the cointegration test.
Implementation of a based rule trading strategy
Basic Strategy
In what follows we define the spread as follow: log(Yt) − α log(Xt) − β With α and β are found by using an Ordinary Least Squares regression. The algorithm has been built in a way that the traders could be entered or exited on a daily basis. The algorithm will first determine the actual position in the portfolio and observe the spread value once the spread exceeded a certain value, we either buy or sell the spread. This threshold is defined as a number of standard deviations.
For each trading period, when the algorithm enters a trade, we need to find the exact number of stocks to buy and sell in order to make sure that we are as market neutral as possible. The hedging ratio could be found by a simple OLS or by Kalman filter regression as explained in Figure 4.
Approximation of alpha and beta
To approximate alpha and beta we use two methods, we either use a rolling Regression The results of this method depend on the windows sizes chosen and it is not easy to find an optimal windows size. A small window results in the estimations being noisy, however a large window causes the trading logic to respond slowly to trends in α and β. Therefore, the optimal N is found by testing several possibilities and we choose the best one. Below we found the results of the strategy used
To the enhance the previous algorithm, we have decided to use Kalman Filter which more precise method to approximate α and β. Kalman’s filter is an infinite impulse response filter that estimates the states of a dynamic system from a series of incomplete or noisy measurements. The Kalman filter in discrete context is a recursive estimator. This means that in order to estimate the current state, only the previous state estimate and the current measures are necessary. Thus, historical observations and estimates are not required. The filter status is represented by 2 variables:
– The estimate of the state at time k;
– The error covariance matrix (a measure of estimated state accuracy).
The Kalman filter has two distinct phases: Prediction and Update. The prediction phase uses the estimated state of the previous moment to produce an estimate of the current state. In the update step, the current time observations are used to correct the predicted state in order to obtain a more accurate estimate. We will use Kalman filter in order to estimate the α and β presented before. Actually, if we assume that α and β both follows a random stochastic process:
We apply the Kalman filter to estimate the α and β. State transition equations are written above and the observation equation is as shown in figure 5.
Implementing of a machine learning trader
In Reinforcement Learning an agent is trained to find the best behavior in an environment by performing actions and adapting to the results. It is different from other Machine Learning systems, such as Deep Learning, in the way learning happens: it is an interactive process, as the agent actions actively changes its environment.
The trading environment could be seen as a video game where the agent has to maximize his rewards which represents the gain from the trade. Reinforcement learning has been used to build a trading agent to enter or exit trading positions while trading the spread of the pairs. Within this framework, the agent will be able to observe the environment and take actions (Buy, Sell, Hold) and then receive a reward which will reflect how could was the action made. Just after taking the action the agent will be able to observe the new environment that depends on the action performed.
During this iterative process, the agent will be able to improve his action thanks to the feed-backs that he receives after each action. In the case with a low dimension the agent will be able to test each combination of (Action, State). Actually, each time the agent takes an action, it updates its policy until it finds the optimal policy which allows to maximize the future rewards. This policy represents a Q-table where the agent find the reward corresponding to each possible action:
In the most of the real-world problems where Reinforcement learning is applied the dimension of the problems are very large in particularly The size of the quantitative description of the environment in finance may be large or even continuous. Therefore, it is computationally impossible to explore all possible action state and to use directly the Q learning which is based on updating the Q table to find the best policy. More advanced implementations of RL include Google Deep Mind‘s Deep Reinforcement Learning. The technique adds deep neural networks to approximate, at a given state, the different Q-values for each action. This allows the model to map between a state and the best possible action without storing all possible cases.
Improvement to the reinforcement trader
The performance of the agent trader is not stable, this due to the fact that we do not have enough data to train our model. We are working currently on the following point in order to improve the performance of the agent trader:
– Feature engineering: in order to help the algorithm to learn faster and better, we will add technical analysis features to its observed environment. In this part we have to feed the algorithm with coherent data that is why it has been decided to study first the correlation of the different features given directly by the Library ta to make sure that we are not giving two opposite signals to the agent at the same time.
– Reward: in the previous version the reward was equal to how much money the agent is winning or losing. In this new version we will link the reward to different metrics such as Sharpe Ratio Sortino, Maximum Drawdown... The results of the second version of the code will be added later as the code is not finished at this time
Conclusion
First of all, we led a far reaching study of traditional methodologies in pair trading and used three of a machine learning algorithm (DBSCAN) to cluster the stocks of SXXP 600. Then we applied the test of cointegration in order to keep only the most profitable pairs. In a second part of this thesis we implemented basic trading strategy and then we added some tools such as Kalman filter in order to boost the return of the traders. Finally, we have implemented a first version of the reinforcement learning which is profitable but as we are now adding new element to the second version in order to make the agent more profitable.