Algorithmic trading with Keras

The goal of this article is to provide the necessary notions to perform time-series forecasting on financial data using the library Keras for Deep Learning.

Apr 01, 2025

∙ Paid

In particular, we will use two models involving LSTM recurrent neural networks and 1-dimensional convolutions to develop an investment strategy for the S&P 500 index.

We will test that, in a period of 4 years which includes the 2008 crisis, these deep learning strategies performed far better than the buy and hold strategy (stay always in the market) and the moving average strategy (stay in the market when the current price is greater than the moving average of past 12 months and sell when it becomes smaller). To quantify these performances, we will compute the gross and net yield (considering the tax on capital gain and the fee to the broker at each transaction).

import pandas as pd
import numpy as np
import datetime
import time
import matplotlib.pyplot as plt
from pandas_datareader import data as pdr

import keras
from keras.models import Sequential
from keras.optimizers import RMSprop,Adam
from keras.layers import Dense,Dropout,BatchNormalization,Conv1D,Flatten,MaxPooling1D,LSTM
from keras.callbacks import EarlyStopping,ModelCheckpoint,TensorBoard,ReduceLROnPlateau
from keras.wrappers.scikit_learn import KerasRegressor
from keras.models import load_model
from sklearn.preprocessing import MinMaxScaler

This code sets up a simple deep learning model for automated stock trading using Keras. It loads essential packages like pandas and numpy for handling data, datetime for managing times, and matplotlib for plotting charts. It also brings in financial data using the pandas_datareader. For the neural network, it gets key components from Keras, including models and layers, and connects with scikit-learn for model fitting. The MinMaxScaler from scikit-learn is used to normalize the data before it goes into the neural network. The code prepares everything needed to create and run the trading algorithm.

Download Financial Data

We download the data of the S&P 500 index from Yahoo Finance. Our analysis is monthly-based, and all the decisions are made the first trading day of the month. For this reason, we fix the conventin that *start_date* will always be the first day of a month and *end_date* will always be the last day of a month.

Our analysis will start from 24 months after the month of *start_date*, since we will use the first 24 months to produce the 2-years moving average.

start_date=datetime.datetime(1973, 1, 1)
end_date=datetime.datetime(2011,3,31)

This code sets up start and end dates — January 1, 1973, to March 31, 2011 — for a time range. It helps a trading algorithm learn from past data, so it can make better predictions within these dates.

df = pdr.get_data_yahoo('^GSPC', start=start_date, end=end_date)
df.drop("Adj Close",axis=1,inplace=True)
print(df.tail())

This code is part of a trading system that uses the Keras machine learning library. It first gets stock data from Yahoo Finance using pandas_datareader for the ^GSPC symbol between certain dates, and saves this data in a variable df using the pandas format. Then it removes the Adj Close column because its not needed for this strategy. Finally, it prints the last bit of the stock data to show the code worked. The code fetches, cleans up, and summarizes stock information.

We need the list of the first trading day for each month, so we compute it

start_year=start_date.year
start_month=start_date.month
end_year=end_date.year
end_month=end_date.month

first_days=[]
# First year
for month in range(start_month,13):
    first_days.append(min(df[str(start_year)+"-"+str(month)].index))
# Other years
for year in range(start_year+1,end_year):
    for month in range(1,13):
        first_days.append(min(df[str(year)+"-"+str(month)].index))
# Last year
for month in range(1,end_month+1):
    first_days.append(min(df[str(end_year)+"-"+str(month)].index))

The code extracts the start and end dates and generates a list of the earliest dates for each month called first_days. It does this for each month from the starting year up to the last year, only including months up to the end date. This list helps break down the data into monthly periods for analysis and forecasting.

For each month we need the means of the month, the first trading day of the current month (and its open price) and the first trading day of the next month (and its open price): out models will predict based on these data.

The feature *quot* is the quotient between the open price of the first trading day of the next month and the open price of the first trading day of the current month. It will be useful because it gives the variation of the portfolio for the current month.

Finally we add the columns corresponding to the moving averages at 1 and 2 years.

def monthly_df(df):

    dfm=df.resample("M").mean()
    dfm=dfm[:-1] # As we said, we do not consider the month of end_date
    
    dfm["fd_cm"]=first_days[:-1]
    dfm["fd_nm"]=first_days[1:]
    dfm["fd_cm_open"]=np.array(df.loc[first_days[:-1],"Open"])
    dfm["fd_nm_open"]=np.array(df.loc[first_days[1:],"Open"])
    dfm["quot"]=dfm["fd_nm_open"].divide(dfm["fd_cm_open"])
    
    dfm["mv_avg_12"]= dfm["Open"].rolling(window=12).mean().shift(1)
    dfm["mv_avg_24"]= dfm["Open"].rolling(window=24).mean().shift(1)
    
    dfm=dfm.iloc[24:,:] # we remove the first 24 months, since they do not have the 2-year moving average
    
    return dfm

This code snippet is for setting up data for algorithmic trading with Keras. It starts by converting daily data into monthly averages. The code then excludes the last incomplete month to maintain accuracy. It constructs columns for the first days of the current and next month, and then for those days opening prices. A growth rate column is created by dividing the next months open by the current months open. It calculates 12-month and 24-month moving averages to help predict the next months prices. The code removes the first two years to clear out the empty moving average data and any remaining missing data to clean up the dataset before its used for trading predictions.

dfm=monthly_df(df)

print(dfm.head())
print(dfm.tail())

#each month of dfm contains the moving averages of the previous 12 and 24 months (excluding the current month)
print(dfm.loc["1980-03","mv_avg_12"])
print(dfm.loc["1979-03":"1980-02","Open"])
print(dfm.loc["1979-03":"1980-02","Open"].mean())

The code calculates monthly moving averages for a dataset using a monthly_df function. It then displays the first and last five rows of the new dataframe. The code also shows the moving average for March 1980 and the average opening values from March 1979 to February 1980. The moving averages cover 12 and 24 months before the current month. This technique helps identify long-term trends for algorithmic trading.

Define Function To Compute Gross And Net Yeild

Notice that the gross yield can be computed very easily using the feature *quot* of the dataframe.

In the following function the vector *v* selects which months we are going to stay in the market.

def yield_gross(df,v):
    prod=(v*df["quot"]+1-v).prod()
    n_years=len(v)/12
    return (prod-1)*100,((prod**(1/n_years))-1)*100

This script is for algorithmic trading using Keras. It processes data to find the gross yield of a trading strategy. It adjusts a quot column in the data with a variable v, subtracts 1, and multiplies it with other columns to get a value prod. Then, it figures out the number of years n_years by dividing the length of v by 12. Gross yield is the prod minus 1, times 100. It also calculates the annualized yield by taking prod to the 1/n_years power, subtracting 1, and multiplying by 100. The code helps assess a trading strategys effectiveness over time.

We now need to define a function to compute the net yield, considering (as it is by the Italian law) a 26% tax on capital gain and 0.10% commission to the broker at each transaction. Clearly these values can be changed to adapt the function to the tax system of other countries.

tax_cg=0.26
comm_bk=0.001

The code sets the tax_cg variable to 0.26 and comm_bk to 0.001. These are used for tax and commission fees in the trading code. Setting them like this helps the code calculate costs automatically and lets you change the fees easily when needed.

The following function will be used to compute the net yield.

Given any vector of zeros and ones as input, *separate_ones* will return the sequence of vectors of groups of adjacent ones and a scalar equal to the number of groups of adjacent ones.

def separate_ones(u):
    
    u_ = np.r_[0,u,0]
    i = np.flatnonzero(u_[:-1] != u_[1:])
    v,w = i[::2],i[1::2]
    if len(v)==0:
        return np.zeros(len(u)),0
    
    n,m = len(v),len(u)
    o = np.zeros(n*m,dtype=int)

    r = np.arange(n)*m
    o[v+r] = 1

    if w[-1] == m:
        o[w[:-1]+r[:-1]] = -1
    else:
        o[w+r] -= 1

    out = o.cumsum().reshape(n,-1)
    return out,n

This code processes an array u by adding a zero at its start and end to form a new array. It then identifies the positions where consecutive elements change. These positions are stored in two arrays, v and w. If there are no changes i.e., v is empty, the code returns an array of zeros the same length as u. If changes exist, it creates another zero-filled array, setting some elements to 1 or -1 based on v and w, then sums up these values cumulatively. Finally, it reshapes the summed array to match the dimensions of v by u and returns this reshaped array along with the length of v.

Let us clarify the befaviour of this function by an example:

u=np.array([0,1,1,0,1,1,1,0,1])

The code creates a numpy array named u with nine 0s and 1s that symbolize binary signals for algorithmic trading. It goes through the array, adding 1s to a up list indicating an upward stock trend and 0s to a down list for a downward trend. Afterward, it counts the items in each list to gauge the trends strength, which aids trading decisions.

separate_ones(u)

This function sorts numbers, separating the number one from others in a list. It makes a new list for the ones, adds them to it when found, and then combines this list with the original. This can help with certain calculations in binary trading, where ones need to be dealt with separately.

The following function is the one which we will use to compute the net yield.

Again, the vector v selects which months we are going to stay in the market.

def yield_net(df,v):
    n_years=len(v)/12
    
    w,n=separate_ones(v)
    A=(w*np.array(df["quot"])+(1-w)).prod(axis=1)  # A is the product of each group of ones of 1 for df["quot"]
    A1p=np.maximum(0,np.sign(A-1)) # vector of ones where the corresponding element if  A  is > 1, other are 0
    Ap=A*A1p # vector of elements of A > 1, other are 0
    Am=A-Ap # vector of elements of A <= 1, other are 0
    An=Am+(Ap-A1p)*(1-tax_cg)+A1p
    prod=An.prod()*((1-comm_bk)**(2*n)) 
    
    return (prod-1)*100,((prod**(1/n_years))-1)*100

It starts by receiving a table of data and a list of numbers. The lists size is divided by 12 to figure out how many years were dealing with. A special function groups these numbers into two categories, w and n. Then, it calculates a value A by multiplying a specific column in the data table by w and then adjusting it by the factor 1-w using the numpy tool. Next, the code forms a new list that marks with a 1 where A is more than 1, and 0 otherwise. It uses this list to generate two additional lists by adjusting A based on whether its elements are greater or less than 1. It calculates a final list, considering taxes and fees, by a combination of these previous calculations. To finish, the code multiplies all the numbers in the final list and adjusts for the trading commission to find out the increase in value over the entire period. Then it computes two results: the average yearly growth rate and the total growth over the entire period considered.

Algorithmic trading with Keras

The goal of this article is to provide the necessary notions to perform time-series forecasting on financial data using the library Keras for Deep Learning.

Download Financial Data

Define Function To Compute Gross And Net Yeild

Define and train the two Deep Learning models

This post is for paid subscribers