Predicting financial time series using CNN
Image processing is the origin of convolutional neural networks. To recognize the MNIST handwritten digits, it was first published in LeNet.
Images are not the only thing convolutional neural networks can handle.
The purpose of this tutorial is to demonstrate how CNN can be used for time series prediction in the financial markets. This example illustrates how Keras can also be used for training models.
Upon completion of this tutorial, you will be able to:
An example of a multidimensional financial data series?
In a classification problem, how can CNN be applied to time series
A Keras model can be trained using generators
A Keras model can be evaluated using a custom metric
Here’s how it works
Idea background
Here we are implementing the CNNpred algorithm based on a diverse set of variables by Ehsan Hoseinzade and Saman Haratizadeh. You can find the author’s data file and sample code on GitHub:
In this paper, we predict the stock market direction for the next day (i.e., up or down compared to today), so we are dealing with a binary classification problem. This problem is intriguing, however, due to how it is formulated and solved.
We have seen examples of CNN’s being used to predict sequences. As an example, a CNN with 1D convolution can be built for predicting Dow Jones Industrial Average (DJIA). A 1D convolution on a time series roughly calculates its moving average or, in digital signal processing terms, applies a filter to it. A trend can be derived from it.
It’s pretty apparent, however, that some derived signals can be useful for predictions when we look at financial time series. A better clue can be found by combining price and volume. Besides the moving average of different window sizes, there are also some other technical indicators that are useful. Our objective is still to predict the direction of one-time series when all these features are aligned. The table of data will have multiple features for each instance, and each instance has multiple features.
CNNpred prepares the following 82-time series features for the DJIA:
The CNNpred paper shows the list of features used.
In CNN models, data is presented as a matrix, unlike LSTM, where time steps are explicitly applied. Below is a table displaying the features across multiple time steps as a 2D array.
Data preprocessing
Using Tensorflow’s Keras API, we try implementing CNNpred from scratch. To illustrate some Keras techniques, we reimplement the author’s reference implementation from the github link above.
As a starting point, the data can be downloaded from the Dataset directory on the github repository above, or you can also get a copy here:
To identify the ticker symbol for the market index, the input data has a date column and a name column. The date column can be left as a time index, and the name column can be removed. All the rest are numbers.
We create a classification label first before predicting the market direction. By comparing tomorrow’s closing index with today’s, we can determine the market direction. Using X[“Close”].pct_change() on a pandas DataFrame, we can determine whether the market went up or down based on the percent change. Using one time step back as our label, we can say:
As you can see, the code above calculates the percentage change in the closing index and aligns it with the data from the previous day. Depending on whether the percentage change is positive or negative, convert the data into 1 or 0.
The five data files in the directory are read individually as pandas DataFrames and stored in Python dictionaries:
For each index, the classification label is represented in the column “Target”, while the input features are represented in all other columns. The data is also normalized using a standard scaler.
In time series problems, it is generally reasonable to set up a cutoff point before which the data before becomes the training set and that afterward becomes the test set, rather than randomly dividing the data into training and test sets. Scaling is applied to the entire dataset based on the training set.