Forecasting Stock Prices With XGBoost
Predicting stock prices with precision is a critical challenge in financial analytics.
This article explores an advanced approach using the XGBoost algorithm to forecast next-day stock prices based on historical data. Our model is trained on three years of stock data, segmented into training (60%), development (20%), and test (20%) sets.
The core of our methodology lies in data normalization and adaptive scaling. Initially, we standardize the training set to a mean of 0 and variance of 1, applying this transformation to the development and test sets for consistency. We further refine the model by scaling the past N days’ data in the development set, ensuring predictions are based on appropriately normalized inputs.
The most significant evolution in our approach involves adaptive scaling for the development and test sets. Rather than applying a uniform scaling factor, we dynamically adjust the scaling based on the mean and variance of the preceding N days’ data. This ensures that our model remains sensitive to recent market trends and data variations, enhancing its predictive accuracy for future stock prices.
This article presents a detailed exploration of this sophisticated predictive model, demonstrating how machine learning can be leveraged for more accurate financial forecasting.
Let’s start coding:
import math
import matplotlib
import numpy as np
import pandas as pd
import seaborn as sns
import time
from datetime import date
from matplotlib import pyplot as plt
from pylab import rcParams
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
from tqdm import tqdm_notebook
from xgboost import XGBRegressor
%matplotlib inline
#### Input params ##################
stk_path = "./data/VTI.csv"
test_size = 0.2 # proportion of dataset to be used as test set
cv_size = 0.2 # proportion of dataset to be used as cross-validation set
N = 3 # for feature at day t, we use lags from t-1, t-2, ..., t-N as features
n_estimators = 100 # Number of boosted trees to fit. default = 100
max_depth = 3 # Maximum tree depth for base learners. default = 3
learning_rate = 0.1 # Boosting learning rate (xgb’s “eta”). default = 0.1
min_child_weight = 1 # Minimum sum of instance weight(hessian) needed in a child. default = 1
subsample = 1 # Subsample ratio of the training instance. default = 1
colsample_bytree = 1 # Subsample ratio of columns when constructing each tree. default = 1
colsample_bylevel = 1 # Subsample ratio of columns for each split, in each level. default = 1
gamma = 0 # Minimum loss reduction required to make a further partition on a leaf node of the tree. default=0
model_seed = 100
fontsize = 14
ticklabelsize = 14
####################################
This simplified script uses the XGBoost algorithm to predict stock prices. It loads necessary libraries including math, matplotlib, numpy, pandas, and seaborn, and configures parameters like dataset location, test/training set sizes, feature count, and XGBoost settings like tree count, depth, learning rate, and instance weights. Additionally, it sets a seed for consistent results and defines the font size for charts. The script is set up to prepare and execute stock price predictions with XGBoost.
There is no notebook to download, entire code in the code itself.