Using Deep Learning To Predict US Treasury Yields
Encoding-decoding architecture with Attention mechanism based on multivariate multistep LSTMs.
Here is the table of contents
A motivating factor
Knowledge prior to entering the program
Exploration and Data
Developing models
As a result
Predictions for the future
Next Steps & Conclusion
A motivating factor
Shifts in the shape and slope of the yield curve are thought to be related to investor expectations for the economy and interest rates. The short-term interest rate is typically lower than the long-term interest rate in normal times. It is hypothesized, but not established, that the trend changes or even reverses during recessions and other adverse events. Investors will rush to buy bonds with longer maturities as short-term rates rise.
In this manner, it is always advantageous to know how the curve is evolving over time, as many economists believe the US Treasury Yield is able to predict the movements of other financial markets such as stocks, futures, and options. Global economies are also impacted by the trades of US Treasury issuances.
The majority of multidimensional time-series prediction models focused only on stock price prediction. The objective is to use those models to predict the term structure of U.S. Treasury Yield, which can also be viewed as a multidimensional time series.
As a result, I used an Encoder-Decoder Network based on Long Short Term Memory (LSTM) and an Attention mechanism to predict Term Structure changes over the next 10 days based on 30 days' rates over the previous 30 days.
Knowledge prior to entering the program
Relationships across time can be learned using recurrent neural networks. The gradient of the loss function decays exponentially over time, so they cannot maintain long-term temporal dependencies. Through some special gates, LSTMs are able to remember long-term temporal dependencies.
Natural Language Processing tasks such as Question Answering, Machine Translation, Language Modeling, etc., typically use encoder-decoder architectures for sequence-to-sequence modelling. As this is also a sequence-to-sequence modelling task, they are being used for multivariate multistep time-series forecasting.
Recurrent Neural Network-based Encoder-Decoder architectures use only the final hidden state of the timestep as the feature representation vector after the encoding step. The Attention mechanism, however, considers the hidden states from each timestep as a weighted sum.
It is beyond the scope of this article to explain the concepts behind the methodology in any further detail.
Exploration and Data
You can download the dataset from the US Treasury website.
There are 13 columns in it, as shown below. For a particular date, the first column is the date, followed by the yields for various maturity periods.
// Add the first picture of dataset
// Add the second picture of dataset
As the number of observations for the 2 Month rate is very low, it is excluded from the analysis. Due to a long pause in the issuance of 2 Month Treasury Bills, this happened.
Here is a sample plot of a yield curve for the date of 06/29/1992 that can be found below.
There are also missing values for the Treasury rate when no data is available. Whenever there is no issuance, the missing values are replaced by the corresponding mean rates. For the rest, they are interpolated from the adjacent term rates.
For the analysis, 1990-2022 is the Horizon. Several adverse events are included in this timeline, including The Great Financial Crisis of 2008, The Covid Pandemic of 2020, and The Dot-Com Bubble 2000. Yield curves deviate from their normal upward trajectory during highly uncertain periods.
Here is a visualization showing how interest rates soared around these events, causing bottlenecks in the flow around these events. Short-term interest rates are usually higher during bad times than long-term interest rates. Although there are a number of other variables that affect the term structure, this cannot be considered a trend.
Modelling
Data Transformation:
Creating the dataset is the first step in building the model. In order to predict the next 10 days' rates, we need the last 30 days' data. Here is the code block that implements this. To change the timeframes considered, we can modify the look_back and look_ahead variables.
Model Architecture:
Based on this snapshot, I designed the Encoder-Decoder model with Attention.
It is the same structure for both Encoder and Decoder networks with 300 Hidden Units and 4 layers, except for the Attention mechanism at the end of Encoder network. The input dimension of the encoder is 11, which is the same as the output dimension of the decoder.
For parallel processing and faster training times, I used Google Colab and PyTorch for building the models and training them on GPUs.
Encoder:
Decoder
Encoder-Decoder Wrapper
Model Parameters:
To build the model, we used the following hyperparameters after tuning.
In training, Huber Loss is used as the objective function since it is not affected by outliers and its delta parameter can be controlled. L1 and L2 losses are combined here. Assume the Huber Loss function is as follows:
Backpropagation of loss and updating weights with a learning rate of 1e-5 were performed here using Adam Optimiser. Nowadays, deep neural networks are trained using this method by default.
The source code is available only for paid subscribers to download and edit. If you want the source code and dataset then please become a subscriber.