A comparison of XGBoost, RNN, and LSTM networks for improving stock price predictions
It is considered a holy grail in the world of finance to be able to accurately predict future stock prices.
The seemingly cryptic code of the stock market has been cracked by a myriad of techniques employed by experts and enthusiasts alike.
In this area, a notable project spearheaded by Priyaank employs a combination of powerful machine learning techniques to predict adjusted closing prices. XGBoost regression analysis combined with hyper-parameter tuning has enabled the project to achieve remarkable accuracy.
Moreover, it incorporates Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) Networks, resulting in a final Root Mean Square Error (RMSE) of 33.59 and a Mean Absolute Percentage Error (MAPE) of 1.552 %. Providing a comprehensive overview of the project’s approach to stock price prediction, this article explores the intricate workings of these methods.
Let’s Start Coding
import math
import matplotlib
import numpy as np
import pandas as pd
import seaborn as sns
import time
import pandas_datareader.data as web
from datetime import date, datetime, time, timedelta
from matplotlib import pyplot as plt
from pylab import rcParams
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from tqdm import tqdm_notebook
%matplotlib inline
test_size = 0.2 # proportion of dataset to be used as test set
cv_size = 0.2 # proportion of dataset to be used as cross-validation set
Nmax = 30 # for feature at day t, we use lags from t-1, t-2, ..., t-N as features
# Nmax is the maximum N we are going to test
fontsize = 14
ticklabelsize = 14
Data analysis and visualization libraries and modules are imported in this code. Math, matplotlib, numpy, pandas, seaborn, time, and pandas_datareader are among these libraries. Some specific functions and objects are imported after the imports.
The code imports the date, datetime, time, and timedelta objects from the datetime module, as well as LinearRegression, mean_squared_error, and r2_score functions from sklearn.linear_model and sklearn.metrics. There are also some global settings for the plotting library, such as the font size and tick labels. Several variables are defined after the initial setup.
Test_size refers to the proportion of the dataset that will be used as the test set. cv_size represents the proportion of the dataset that will be used for cross-validation. Nmax refers to the maximum number of lagged features that will be used to predict a specific day. The model will predict using the values from the previous Nmax days for each day in the dataset.
In plots, the fontsize and ticklabelsize variables determine the size of the font and tick labels. %matplotlib inline allows plots to be displayed directly below the code cell when executed in Jupyter notebooks. It sets up the necessary libraries and modules, imports specific functions and objects, defines variables for data analysis and visualization, and sets up the plotting environment for displaying plots inline.
Download the source code by checking the link below: