Onepagecode

Onepagecode

Quant Trading Strategies: Exploiting Mean Reversion in Gold Miner ETFs EP7/365

From Parameter Optimization to Factor Correlation: Evaluating Statistical Arbitrage Performance Against Fama-French Benchmarks.

Onepagecode's avatar
Onepagecode
Jan 14, 2026
∙ Paid

Download source code using the url at the end of this article!

In the seventh installment of our 365-day quantitative trading series, we shift our focus to the highly liquid and interconnected world of Gold Miner ETFs. While many retail traders approach gold through simple “long-only” positions, institutional algo trading frequently utilizes the relative price movements between similar funds to capture alpha. This article provides an end-to-end framework for executing a statistical arbitrage strategy between the iShares MSCI Global Gold Miners ETF (RING) and the VanEck Gold Miners ETF (GDX), demonstrating how a systematic approach can turn subtle market inefficiencies into consistent returns.

The core of this strategy lies in the principle of mean reversion. Because these two ETFs track overlapping sets of underlying companies, their price returns are fundamentally “tethered” to one another. When the return spread between them diverges beyond historical norms, it creates a high-probability opportunity for a convergence trade. We move beyond basic price-action analysis by constructing a robust backtesting engine that utilizes rolling-window return dynamics, allowing the algorithm to adapt to short-term market shifts while maintaining a strictly disciplined entry and exit regime.

Readers can expect a deep dive into the practical realities of quant trading, including a sophisticated method for liquidity-adjusted position sizing. Rather than assuming infinite market depth, our model calculates a capacity proxy ($N_t$) based on the rolling median dollar volume of both legs. This ensures that the strategy remains realistic for larger capital allocations and accounts for the “thinnest” part of the trade. We also implement a month-end forced exit policy to eliminate the risks associated with overnight gaps and liquidity dry-spells between trading cycles.

Beyond the trade signals, we subject our strategy to rigorous performance evaluation. We conduct an extensive parameter sweep across varying rolling window lengths and threshold sensitivities to identify the “sweet spot” where profitability meets stability. The article concludes with a high-level comparison of our strategy’s returns against the Fama-French three-factor model. By analyzing our correlation with Market, Size, and Value factors, we can determine whether our gains truly stem from idiosyncratic spread dynamics or are simply a byproduct of broader market betas.

Whether you are an aspiring quant or an experienced trader looking to systematize your workflow, this study provides the Python implementation and the theoretical foundation needed to master spread dynamics. By the end of this guide, you will understand how to build, optimize, and benchmark a pairs-trading strategy that seeks to remain resilient even during periods of high market volatility.

%matplotlib inline
import warnings
warnings.filterwarnings(”ignore”)

import numpy as np
import pandas as pd
import pandas_datareader as pdr

import matplotlib.pyplot as plt
plt.rcParams.update({”figure.figsize”: (15, 8)})

pd.options.display.float_format = ‘{:,.4f}’.format

import nasdaqdatalink

import seaborn as sns
_mako_cmap = sns.color_palette(”mako”, as_cmap=True)

This block is the environment- and presentation-focused setup that prepares the notebook for the simple spread trading workflow that follows. It begins by configuring the interactive notebook to render plots inline so that every chart produced by matplotlib will appear directly in the notebook output; that makes it easy to visually inspect time series, spreads, and diagnostic charts as you iterate on pair selection and strategy logic. Immediately after, the global warnings filter is set to ignore non-critical warnings; this keeps the notebook output uncluttered so the important visual and tabular results stand out during exploratory analysis.

The next set of statements brings in the core numerical, tabular and data-access libraries needed for spread trading: numpy for vectorized numerical operations and array math, pandas for time-series and DataFrame manipulation (which is central to aligning and computing spreads), and pandas_datareader plus nasdaqdatalink as the data sources for fetching historical market prices and related datasets. These libraries together handle the workflow of fetching tick- or daily-price series, aligning timestamps between instruments, computing ratios or differences, and producing the inputs for trading signals.

Plotting and visual styling are configured with matplotlib and seaborn. The global matplotlib rcParams are updated to set a large default figure size, improving readability of multi-line time-series plots and correlation/heatmap visualizations that are typical when inspecting candidate spreads. Pandas’ floating-point display format is set to four decimal places so printed DataFrames (for example prices, normalized spreads, or statistical summaries) appear consistently and compactly in the notebook output. Finally, a seaborn color palette (“mako”) is created and stored in _mako_cmap so subsequent plotting calls can use a coherent, visually distinct colormap for heatmaps and line color selection; this helps maintain consistent aesthetics across the many diagnostic charts used in evaluating pair behavior and stationarity for the spread trading strategy.

Together, these steps don’t perform any market calculations themselves but establish a reproducible, readable environment optimized for fetching price data, computing and visualizing spreads, and iterating on the simple spread trading analysis.

def executeStrategy(M, g, j, s, plot=True, ylim=None):
    ‘’‘
    Execute strategy for rolling window M with thresholds g (enter), j (exit), and s (stop loss).
    Optionally plot results via plotStrategy.
    ‘’‘
    df_work = df.copy()
    df_work[’RollRetRING_GDX’] = df_work[’RINGminusGDXRet’].rolling(M).mean()

    # initialize columns
    for col in [’action’, ‘sharesRING’, ‘sharesGDX’, ‘positionValue’, ‘dayPnL’, ‘positionPnL’, ‘cumPnL’, ‘grossCash’]:
        df_work[col] = 0

    buy_dates = []
    sell_dates = []
    stop_dates = []
    month_ends = []

    for year_prefix in (’2020-’, ‘2021-’):
        for month in range(1, 13):
            period_key = f”{year_prefix}{month}”

            # first trading day of the month
            first_day = df_work.loc[period_key].iloc[0].name
            prev_idx = df_work.index.get_loc(first_day) - 1
            df_work.loc[first_day, ‘cumPnL’] = df_work.iloc[prev_idx][’cumPnL’]

            first_roll = df_work.loc[first_day, ‘RollRetRING_GDX’]
            if first_roll > g:
                df_work.loc[first_day, ‘action’] = -1
                sell_dates.append(first_day)
            elif first_roll < -g:
                df_work.loc[first_day, ‘action’] = 1
                buy_dates.append(first_day)

            if df_work.loc[first_day, ‘action’] != 0:
                vol = df_work.loc[first_day, ‘Nt’] / 100
                df_work.loc[first_day, ‘sharesRING’] = int(round(df_work.loc[first_day, ‘action’] * (vol / df_work.loc[first_day, ‘RINGadj_close’]), 0))
                df_work.loc[first_day, ‘sharesGDX’] = int(round(-df_work.loc[first_day, ‘action’] * (vol / df_work.loc[first_day, ‘GDXadj_close’]), 0))
                df_work.loc[first_day, ‘positionValue’] = (
                    df_work.loc[first_day, ‘sharesRING’] * df_work.loc[first_day, ‘RINGadj_close’]
                    + df_work.loc[first_day, ‘sharesGDX’] * df_work.loc[first_day, ‘GDXadj_close’]
                )
                df_work.loc[first_day, ‘grossCash’] = (
                    abs(df_work.loc[first_day, ‘sharesRING’] * df_work.loc[first_day, ‘RINGadj_close’])
                    + abs(df_work.loc[first_day, ‘sharesGDX’] * df_work.loc[first_day, ‘GDXadj_close’])
                )
                # dayPnL and positionPnL remain zero on entry day

            # middle days of the month (from second to second-last)
            for idx, _ in df_work.loc[period_key].iloc[1:-1].iterrows():
                prev_pos = df_work.iloc[df_work.index.get_loc(idx) - 1]
                prev_action = prev_pos[’action’]

                # no outstanding position
                if prev_action == 0:
                    roll_val = df_work.loc[idx, ‘RollRetRING_GDX’]
                    if roll_val > g:
                        df_work.loc[idx, ‘action’] = -1
                        sell_dates.append(idx)
                    elif roll_val < -g:
                        df_work.loc[idx, ‘action’] = 1
                        buy_dates.append(idx)
                    else:
                        df_work.loc[idx, ‘cumPnL’] = prev_pos[’cumPnL’]
                        continue

                    vol = df_work.loc[idx, ‘Nt’] / 100
                    df_work.loc[idx, ‘sharesRING’] = int(round(df_work.loc[idx, ‘action’] * (vol / df_work.loc[idx, ‘RINGadj_close’]), 0))
                    df_work.loc[idx, ‘sharesGDX’] = int(round(-df_work.loc[idx, ‘action’] * (vol / df_work.loc[idx, ‘GDXadj_close’]), 0))
                    df_work.loc[idx, ‘positionValue’] = (
                        df_work.loc[idx, ‘sharesRING’] * df_work.loc[idx, ‘RINGadj_close’]
                        + df_work.loc[idx, ‘sharesGDX’] * df_work.loc[idx, ‘GDXadj_close’]
                    )
                    df_work.loc[idx, ‘grossCash’] = (
                        abs(df_work.loc[idx, ‘sharesRING’] * df_work.loc[idx, ‘RINGadj_close’])
                        + abs(df_work.loc[idx, ‘sharesGDX’] * df_work.loc[idx, ‘GDXadj_close’])
                    )
                    df_work.loc[idx, ‘cumPnL’] = prev_pos[’cumPnL’]

                # outstanding short RING / long GDX (action == -1)
                elif prev_action == -1:
                    df_work.loc[idx, ‘positionValue’] = (
                        prev_pos[’sharesRING’] * df_work.loc[idx, ‘RINGadj_close’]
                        + prev_pos[’sharesGDX’] * df_work.loc[idx, ‘GDXadj_close’]
                    )
                    df_work.loc[idx, ‘dayPnL’] = df_work.loc[idx, ‘positionValue’] - prev_pos[’positionValue’]
                    df_work.loc[idx, ‘cumPnL’] = prev_pos[’cumPnL’] + df_work.loc[idx, ‘dayPnL’]
                    df_work.loc[idx, ‘positionPnL’] = prev_pos[’positionPnL’] + df_work.loc[idx, ‘dayPnL’]
                    df_work.loc[idx, ‘grossCash’] = prev_pos[’grossCash’]

                    # stop loss
                    if df_work.loc[idx, ‘positionPnL’] / df_work.loc[idx, ‘grossCash’] < s:
                        df_work.loc[idx, ‘action’] = 0
                        df_work.loc[idx, ‘sharesRING’] = 0
                        df_work.loc[idx, ‘sharesGDX’] = 0
                        df_work.loc[idx, ‘grossCash’] = 0
                        df_work.loc[idx, ‘positionPnL’] = 0
                        df_work.loc[idx:, ‘cumPnL’] = df_work.loc[idx, ‘cumPnL’]
                        df_work.loc[idx, ‘positionValue’] = 0
                        stop_dates.append(idx)
                        break

                    # reverse to opposite position
                    roll_val = df_work.loc[idx, ‘RollRetRING_GDX’]
                    if roll_val < -g:
                        df_work.loc[idx, ‘action’] = 1
                        vol = df_work.loc[idx, ‘Nt’] / 100
                        df_work.loc[idx, ‘sharesRING’] = int(round(df_work.loc[idx, ‘action’] * (vol / df_work.loc[idx, ‘RINGadj_close’]), 0))
                        df_work.loc[idx, ‘sharesGDX’] = int(round(-df_work.loc[idx, ‘action’] * (vol / df_work.loc[idx, ‘GDXadj_close’]), 0))
                        df_work.loc[idx, ‘grossCash’] = (
                            abs(df_work.loc[idx, ‘sharesRING’] * df_work.loc[idx, ‘RINGadj_close’])
                            + abs(df_work.loc[idx, ‘sharesGDX’] * df_work.loc[idx, ‘GDXadj_close’])
                        )
                        df_work.loc[idx, ‘positionPnL’] = 0
                        df_work.loc[idx, ‘positionValue’] = (
                            df_work.loc[idx, ‘sharesRING’] * df_work.loc[idx, ‘RINGadj_close’]
                            + df_work.loc[idx, ‘sharesGDX’] * df_work.loc[idx, ‘GDXadj_close’]
                        )
                        buy_dates.append(idx)

                    # close position
                    elif roll_val < j:
                        df_work.loc[idx, ‘action’] = 0
                        df_work.loc[idx, ‘sharesRING’] = 0
                        df_work.loc[idx, ‘sharesGDX’] = 0
                        df_work.loc[idx, ‘grossCash’] = 0
                        df_work.loc[idx, ‘positionPnL’] = 0
                        df_work.loc[idx, ‘positionValue’] = 0
                        buy_dates.append(idx)

                    else:
                        df_work.loc[idx, ‘action’] = prev_action
                        df_work.loc[idx, ‘sharesRING’] = prev_pos[’sharesRING’]
                        df_work.loc[idx, ‘sharesGDX’] = prev_pos[’sharesGDX’]

                # outstanding short GDX / long RING (action == 1)
                elif prev_action == 1:
                    df_work.loc[idx, ‘positionValue’] = (
                        prev_pos[’sharesRING’] * df_work.loc[idx, ‘RINGadj_close’]
                        + prev_pos[’sharesGDX’] * df_work.loc[idx, ‘GDXadj_close’]
                    )
                    df_work.loc[idx, ‘dayPnL’] = df_work.loc[idx, ‘positionValue’] - prev_pos[’positionValue’]
                    df_work.loc[idx, ‘cumPnL’] = prev_pos[’cumPnL’] + df_work.loc[idx, ‘dayPnL’]
                    df_work.loc[idx, ‘positionPnL’] = prev_pos[’positionPnL’] + df_work.loc[idx, ‘dayPnL’]
                    df_work.loc[idx, ‘grossCash’] = prev_pos[’grossCash’]

                    # stop loss
                    if df_work.loc[idx, ‘positionPnL’] / df_work.loc[idx, ‘grossCash’] < s:
                        df_work.loc[idx, ‘action’] = 0
                        df_work.loc[idx, ‘sharesRING’] = 0
                        df_work.loc[idx, ‘sharesGDX’] = 0
                        df_work.loc[idx, ‘grossCash’] = 0
                        df_work.loc[idx, ‘positionPnL’] = 0
                        df_work.loc[idx:, ‘cumPnL’] = df_work.loc[idx, ‘cumPnL’]
                        df_work.loc[idx, ‘positionValue’] = 0
                        stop_dates.append(idx)
                        break

                    # reverse to opposite short
                    roll_val = df_work.loc[idx, ‘RollRetRING_GDX’]
                    if roll_val > g:
                        df_work.loc[idx, ‘action’] = -1
                        vol = df_work.loc[idx, ‘Nt’] / 100
                        df_work.loc[idx, ‘sharesRING’] = int(round(df_work.loc[idx, ‘action’] * (vol / df_work.loc[idx, ‘RINGadj_close’]), 0))
                        df_work.loc[idx, ‘sharesGDX’] = int(round(-df_work.loc[idx, ‘action’] * (vol / df_work.loc[idx, ‘GDXadj_close’]), 0))
                        df_work.loc[idx, ‘grossCash’] = (
                            abs(df_work.loc[idx, ‘sharesRING’] * df_work.loc[idx, ‘RINGadj_close’])
                            + abs(df_work.loc[idx, ‘sharesGDX’] * df_work.loc[idx, ‘GDXadj_close’])
                        )
                        df_work.loc[idx, ‘positionPnL’] = 0
                        df_work.loc[idx, ‘positionValue’] = (
                            df_work.loc[idx, ‘sharesRING’] * df_work.loc[idx, ‘RINGadj_close’]
                            + df_work.loc[idx, ‘sharesGDX’] * df_work.loc[idx, ‘GDXadj_close’]
                        )
                        sell_dates.append(idx)

                    # close position
                    elif roll_val > j:
                        df_work.loc[idx, ‘action’] = 0
                        df_work.loc[idx, ‘sharesRING’] = 0
                        df_work.loc[idx, ‘sharesGDX’] = 0
                        df_work.loc[idx, ‘grossCash’] = 0
                        df_work.loc[idx, ‘positionPnL’] = 0
                        df_work.loc[idx, ‘positionValue’] = 0
                        sell_dates.append(idx)

                    else:
                        df_work.loc[idx, ‘action’] = prev_action
                        df_work.loc[idx, ‘sharesRING’] = prev_pos[’sharesRING’]
                        df_work.loc[idx, ‘sharesGDX’] = prev_pos[’sharesGDX’]

            # handle last trading day of the month
            last_day = df_work.loc[period_key].iloc[-1].name
            prev_last = df_work.iloc[df_work.index.get_loc(last_day) - 1]
            if prev_last[’action’] != 0:
                df_work.loc[last_day, ‘positionValue’] = (
                    prev_last[’sharesRING’] * df_work.loc[last_day, ‘RINGadj_close’]
                    + prev_last[’sharesGDX’] * df_work.loc[last_day, ‘GDXadj_close’]
                )
                df_work.loc[last_day, ‘dayPnL’] = df_work.loc[last_day, ‘positionValue’] - prev_last[’positionValue’]
                df_work.loc[last_day, ‘cumPnL’] = prev_last[’cumPnL’] + df_work.loc[last_day, ‘dayPnL’]

                df_work.loc[last_day, ‘action’] = 0
                df_work.loc[last_day, ‘sharesRING’] = 0
                df_work.loc[last_day, ‘sharesGDX’] = 0
                df_work.loc[last_day, ‘grossCash’] = 0
                df_work.loc[last_day, ‘positionPnL’] = 0
                df_work.loc[last_day, ‘positionValue’] = 0
                month_ends.append(last_day)
            else:
                df_work.loc[last_day, ‘cumPnL’] = prev_last[’cumPnL’]

    profit = df_work.cumPnL[-1]
    capital = 2 * df_work[’Nt’].max() / 100
    roc = 100 * profit / capital

    if plot:
        if ylim:
            plotStrategy(df_work, M, g, j, buy_dates, sell_dates, month_ends, stop_dates, ylim)
        else:
            plotStrategy(df_work, M, g, j, buy_dates, sell_dates, month_ends, stop_dates)

    return (df_work, roc)

This function implements a simple month-by-month spread-trading engine between two instruments (RING and GDX) whose relative return is summarized by a rolling mean. The overall goal is to open a dollar-balanced, two-legged spread when the smoothed spread signal is large enough, hold or reverse the spread while monitoring realized/unrealized P&L, impose an intraday stop-loss measured as a fraction of gross exposure, and forcibly close any position at each month end. The three parameters control the trading behavior: M is the rolling window size used to smooth the RING–GDX return spread, g is the entry threshold (how large the smoothed spread must be to take a position), j is the exit threshold (a smaller threshold to take profits / close a position), and s is the stop-loss level expressed as a fractional drawdown of gross exposure.

Data preparation: the code makes a working copy of the global df and computes the M-day rolling mean of the spread (RollRetRING_GDX). It then initializes a set of bookkeeping columns (action, share counts for each leg, position and day P&L, cumulative P&L, gross cash exposure) and prepares lists to record trade events (buy_dates, sell_dates, stop_dates, month_ends). The convention for action is numeric: 0 = no position, 1 = long RING / short GDX, -1 = short RING / long GDX. Positions are sized so that each leg carries the same absolute dollar exposure equal to vol = Nt / 100; shares per leg are computed by dividing that dollar exposure by the current adjusted close and rounding to an integer. grossCash is the sum of absolute dollar exposures across both legs and is used to normalize position P&L for stop-loss decisions.

The main loop processes the data month by month for the specified years. For each month it treats three phases: the first trading day, the interior trading days (second through second-last), and the last trading day. On the first trading day the code carries forward the prior day’s cumulative P&L, inspects the current rolling spread, and applies the entry rule: if the smoothed spread exceeds +g it establishes a short-RING / long-GDX position (action = -1); if it is below -g it establishes the opposite (action = 1). Entry sets shares, computes positionValue (net mark-to-market of both legs) and grossCash, but leaves dayPnL and positionPnL at zero because the trade is considered opened at that day’s close.

For interior days the logic depends on whether there is an outstanding position from the prior day. If there is no position, the function again checks the rolling spread against g to potentially open a new position; otherwise it simply copies forward the cumulative P&L from the previous day. If there is an existing position, the function first marks the position to the current prices: it updates positionValue using the previous shares, computes dayPnL as the change in positionValue from the prior day, updates cumulative P&L and the running positionPnL (which is P&L since that position was opened or last reset), and leaves grossCash unchanged. After marking to market, the code applies risk and trade-management rules in a prioritized order:

- Stop-loss: it computes positionPnL / grossCash and, if that falls below s, it immediately closes the position by zeroing action and shares, resetting positionPnL and grossCash, setting positionValue to zero, recording the stop date, and copying the current cumulative P&L forward for all remaining dates (df_work.loc[idx:, ‘cumPnL’] = current cumPnL). The stop triggers a break out of the current month’s inner loop, preventing further trading that month.

- Reversal: if the rolling spread has moved sufficiently in the opposite direction beyond the entry threshold (i.e., it crosses -g when previously short RING, or crosses +g when previously long RING), the code flips the action to the opposite sign, re-sizes the legs using the same vol rule, resets positionPnL to zero (new position basis), recomputes grossCash and positionValue, and records the date as an entry of the new direction (appending to buy_dates or sell_dates as appropriate).

- Exit toward zero: if the rolling spread has moved past the smaller exit threshold j toward the neutral zone (e.g., a prior short position sees roll_val < j), the code closes the position (action = 0, zero shares, zero grossCash and positionPnL, zero positionValue) and records the date as an exit. If none of these conditions apply, the function simply carries forward the previous action and share counts.

The last trading day of the month is treated as a forced close: if there was an active position the previous day, it is marked to market for that last day (dayPnL and cumulative P&L updated) and then explicitly closed (action and shares set to zero, grossCash and positionPnL zeroed) and the date appended to month_ends. This enforces the stated policy to not hold positions overnight between months.

Throughout this process the code builds up two complementary P&L measures. dayPnL is the day-over-day change in the net position value (a mark-to-market movement). positionPnL accumulates dayPnL across the life of the current position and is reset to zero whenever a new position is opened or a reversal occurs; it is the metric used in the stop-loss test. cumPnL accumulates realized and unrealized changes permanently and is carried forward even when positions close or stop triggers occur.

At the end, the function reports overall performance: profit is the final cumulative P&L value; capital is computed as 2 * (maximum Nt / 100), reflecting total two‑leg gross exposure at the largest notional used; and roc is the return-on-capital expressed as a percentage (100 * profit / capital). If plotting is requested, the function calls plotStrategy and passes the trade event lists and configuration for visualization. The function returns the annotated dataframe (df_work) and the computed roc.

import matplotlib.pyplot as plt

def plotStrategy(data, M, g, j, buySpread, sellSpread, monthEnd, stopLoss, ylim=None):
    # plot rolling spread series and mark trade-related points
    series = data[’RollRetRING_GDX’]
    plt.plot(series, label=f”{M} day rolling spread returns”)

    buy_y = [series.loc[idx] for idx in buySpread]
    sell_y = [series.loc[idx] for idx in sellSpread]
    monthend_y = [series.loc[idx] for idx in monthEnd]
    stoploss_y = [series.loc[idx] for idx in stopLoss]

    plt.scatter(buySpread, buy_y, color=’g’, label=’Buy Spread’)
    plt.scatter(sellSpread, sell_y, color=’r’, label=’Sell Spread’)
    plt.scatter(monthEnd, monthend_y, color=’c’, label=’Month End Close’)
    plt.scatter(stopLoss, stoploss_y, color=’m’, label=’Stop Loss’)

    plt.axhline(y=g, color=’b’, linestyle=’-’, label=’g = ‘ + str(g))
    plt.axhline(y=-g, color=’b’, linestyle=’-’)

    plt.axhline(y=j, color=’y’, linestyle=’-’, label=’j = ‘ + str(j))
    plt.axhline(y=-j, color=’y’, linestyle=’-’)

    plt.ylabel(”Spread”, fontsize=12)
    plt.xlabel(”Date”, fontsize=12)

    profit = data.cumPnL[-1]
    capital = 2 * data[’Nt’].max() / 100
    roc = 100 * profit / capital

    n_buy = len(buySpread)
    n_sell = len(sellSpread)
    n_stop = len(stopLoss)
    n_month = len(monthEnd)

    total_trades = 2 * ((n_buy + n_sell + n_stop + n_month) // 2)
    plt.title(f”Strategy Profit = {round(profit,0)} \nReturn on Capital = {round(roc,2)}%, Total Trades ={total_trades}”, fontsize=16)

    if ylim:
        plt.ylim(ylim)

    plt.legend()
    plt.show()

This function builds a diagnostic plot for a simple spread trading strategy and annotates it with the key trade events and performance numbers so you can visually validate how the rules behaved over time. At the start it selects the time series that represents the rolling spread returns (named RollRetRING_GDX in the data frame) and draws that series as the baseline plot; the label includes M so you can remember the rolling-window length used to produce that series. The plotted x-axis uses the series index (typically dates), so every subsequent event marker is placed directly on the same time coordinate system.

Next the code converts the event index lists — buySpread, sellSpread, monthEnd, stopLoss — into y-values by looking them up in the series with series.loc[idx]. This is done so the scatter markers are vertically aligned with the actual spread value at the moment each event occurred. Those markers are then plotted as colored scatter points: green for buys, red for sells, cyan for month-end forced closes, and magenta for stop-loss hits. Plotting all four types makes it easy to see where entries, exits and forced/stop exits happened relative to the spread path.

The two threshold pairs (g and j) are drawn as horizontal lines at both the positive and negative values. In a simple spread strategy you typically use symmetric thresholds to trigger long vs short spread entries and sometimes separate exit thresholds; plotting ±g and ±j makes those entry/exit bands visually explicit. The labels on the horizontal lines include the numeric g and j values so you can immediately see the thresholds in the figure.

After the visualization elements, the function computes the basic performance metrics used in the figure title. It pulls cumulative profit from data.cumPnL[-1], which is treated as the strategy P&L to date. It computes a capital figure as 2 * data[‘Nt’].max() / 100 — i.e., it scales the maximum observed Nt value by 2 and divides by 100 to produce the denominator for return normalization — and then computes return on capital (roc) as 100 * profit / capital, expressing performance as a percentage. These numbers are formatted and displayed in the title along with a count of total trades.

The total trade count is calculated from the counts of the four event lists. The code sums the lengths of buy, sell, stop-loss and month-end events, integer-divides that sum by two, and then multiplies by two again. That produces the largest even integer not greater than the raw event count, which reflects counting full round-trip trades (an even number of trade-related events) rather than raw event points. The plot is optionally constrained vertically if ylim is provided, the legend is added to explain the colors and lines, and finally the plot is shown. Overall, the figure gives you a time-series view of the spread, the precise points where trading rules fired (entries, exits, stops, month-end closes), and the headline performance metrics so you can judge how the simple spread rules performed across the sample.

import pandas as pd
import numpy as np

def summaryStats(df):
    “”“Return summary statistics for each column in the DataFrame.”“”
    annual_mean = df.mean() * 12
    annual_vol = df.std() * np.sqrt(12)
    stats_map = {
        ‘Mean’: annual_mean,
        ‘Volatility’: annual_vol,
        ‘Sharpe Ratio’: annual_mean / annual_vol,
        ‘Skewness’: df.skew(),
        ‘min return’: df.min(),
        ‘95% VaR’: df.quantile(0.05),
        ‘Excess Kurtosis’: df.kurt()
    }
    return pd.DataFrame(stats_map)

This function ingests a DataFrame of return series (one column per instrument or spread) and produces a compact table of per-column performance and risk statistics that are ready for comparison. The code treats the input returns as periodic observations at a monthly frequency: it takes the column-wise mean and scales it by 12 to produce an annualized mean, and it takes the column-wise standard deviation and scales it by sqrt(12) to produce an annualized volatility. These two annualized numbers are then used to compute a simple Sharpe ratio as annual_mean / annual_vol — i.e., a risk-adjusted return metric without an explicit risk-free adjustment — so you get a single scalar that summarizes return per unit of annualized risk for each series.

Beyond mean, volatility, and Sharpe, the function computes a set of distributional and tail-risk statistics directly from the raw series. Skewness is computed per column to capture asymmetry in the return distribution (positive skew indicates larger right tails, negative skew indicates the opposite). The minimum observed return provides the worst realized single-period outcome for each series. The “95% VaR” is implemented as the 5th percentile (df.quantile(0.05)), which corresponds to the value-at-risk at a 95% confidence level — i.e., the loss that is exceeded only 5% of the time. Finally, “Excess Kurtosis” is computed via df.kurt(), giving the excess kurtosis (kurtosis relative to the normal distribution) for each column, which describes tail heaviness beyond skew.

All of these column-wise operations are vectorized and return pandas Series indexed by the original DataFrame columns. The function collects those Series into a dictionary keyed by human-readable metric names and then converts that dictionary into a DataFrame. The resulting DataFrame has metric names as its index (rows) and the original return series as its columns, so each column is a compact summary profile of mean, volatility, Sharpe, skew, min, VaR, and excess kurtosis.

In the context of simple spread trading, the table returned by this function gives you the essential, comparable signals you need to rank spreads: annualized returns and volatilities for scale and risk budgeting, a Sharpe-style ratio for simple risk-adjusted ranking, and skewness/VaR/kurtosis to assess downside and tail behavior of candidate spreads before you act.

import pandas as pd
from itertools import product

def rocSummary(rollingWindowDays, enterThreshold, exitThrehold, stopLoss = -0.01):
    cols = [’Rolling Window’, ‘Enter Threshold’, ‘Exit Threshold’, ‘Return on Capital’]
    results_df = pd.DataFrame(columns=cols)
    results_df.index.name = ‘Strategy #’
    idx = 1
    for win, enter, exit_ in product(rollingWindowDays, enterThreshold, exitThrehold):
        _, roc = executeStrategy(win, enter, exit_, -0.01, plot=False)
        results_df.loc[idx] = [win, enter, exit_, roc]
        idx += 1
    return results_df

This function is a grid-runner that evaluates a simple spread-trading strategy across combinations of lookback windows and entry/exit threshold settings, and collects the resulting return-on-capital (ROC) for each configuration. The parameters you pass in represent the sets to sweep: rollingWindowDays is a collection of lookback lengths used by the strategy to compute whatever spread metric (e.g., moving averages, spread statistics or z-scores) it relies on; enterThreshold and exitThrehold are the numeric trigger levels the strategy uses to open and close positions. The signature also includes a stopLoss default (-0.01), but in this implementation a literal -0.01 is passed into the executor call (so the function accepts a stopLoss argument but the executed call uses -0.01 explicitly).

Internally the function creates an empty tabular container with four columns — Rolling Window, Enter Threshold, Exit Threshold, and Return on Capital — and gives each tested combination a simple sequential identifier called “Strategy #”. It then enumerates the full Cartesian product of the three input sets (rolling windows × entry thresholds × exit thresholds) so every possible parameter triple is tested. For each triple it calls executeStrategy with the lookback (win), entry level (enter), exit level (exit_), a stop-loss value of -0.01, and plot=False; executeStrategy returns at least two values, and this code discards the first and treats the second (roc) as the performance metric of interest.

Each loop iteration appends a row to the results table containing the window, entry threshold, exit threshold, and the ROC returned by the simulation, and advances the strategy index. When all combinations have been evaluated the function returns the populated DataFrame. In short, this routine runs a parameter sweep for a simple spread-trading rule and summarizes each configuration’s return-on-capital so you can compare how different lookback lengths and threshold settings influence strategy performance.

Importing ETF data via Nasdaq Data Link

import nasdaqdatalink as ndl

setattr(ndl.ApiConfig, “api_key”, YourAPIkey)

This code imports the Nasdaq Data Link client library and then configures its global API credentials so subsequent data requests are authenticated. The import brings the nasdaqdatalink module into the runtime so you can call its data-access functions; immediately after, the code assigns your API key onto the library’s ApiConfig object by calling setattr(ndl.ApiConfig, “api_key”, YourAPIkey). In effect this attaches the supplied key as the api_key attribute on the ApiConfig class, making the credential available to the library’s internal request machinery.

The reason this assignment happens up front is to ensure every later call into the nasdaqdatalink client will include the proper authentication token without having to pass the key repeatedly. When you later request time series or quote data, the library reads ndl.ApiConfig.api_key and injects it into HTTP headers or query parameters as required by the Nasdaq Data Link API. Because the attribute is set at the class (module) level, that configuration is global for the process and will be used by any client code that relies on the nasdaqdatalink module.

For the simple spread trading workflow this enables, the authenticated data access is the prerequisite step: once the api_key is set, downstream code can fetch historical price series for the instruments you want to combine, compute spreads, run signal logic, or backtest strategies. In short, these two lines make the Nasdaq Data Link service available to the rest of your spread-trading pipeline by providing the credentials the client library needs to retrieve market data.

import nasdaqdatalink
import pandas as pd

raw_table = nasdaqdatalink.get_table(
    ‘QUOTEMEDIA/PRICES’,
    ticker=[’RING’, ‘GDX’],
    qopts={’columns’: [’ticker’, ‘date’, ‘adj_close’, ‘adj_volume’]},
    date={’gte’: ‘2019-12-02’, ‘lte’: ‘2021-12-31’}
)

reversed_indexed = raw_table.iloc[::-1].set_index(’date’)

symbols = (’RING’, ‘GDX’)
frames = [reversed_indexed[reversed_indexed[’ticker’] == s].add_prefix(s) for s in symbols]

df = pd.concat(frames, axis=1)
df = df.drop(columns=[f’{s}ticker’ for s in symbols])

The block starts by pulling raw price rows for two tickers (RING and GDX) from the QUOTEMEDIA/PRICES table via nasdaqdatalink.get_table. The query narrows the payload to only the columns we need — ticker, date, adj_close, adj_volume — and restricts rows to the inclusive date window from 2019–12–02 through 2021–12–31. This produces a single stacked table containing interleaved records for both securities across the requested date range.

Next, the code reverses the row order with iloc[::-1] and sets date as the DataFrame index. Reversing makes the rows chronological from oldest to newest (many APIs return newest-first), which is important because subsequent time-series work and spread calculations typically assume ascending time. Setting the date index turns the time column into the alignment key for later joins so that rows for the same calendar date line up across instruments.

After normalizing order and indexing, the script isolates each symbol into its own frame. For each symbol in (‘RING’, ‘GDX’) it filters rows where ticker equals that symbol, producing per-symbol time series. It then calls add_prefix(s) on each filtered frame so that every column name is prefixed with the symbol (e.g., “RINGadj_close”, “GDXadj_volume”). Prefixing prevents naming collisions when the two instruments have identically named fields and makes it explicit which columns belong to which ticker.

Those per-symbol frames are concatenated horizontally (axis=1), which aligns them by the date index; this produces a wide table where each date row contains both instruments’ adjusted close and volume side by side. Because the original per-symbol frames still include a prefixed ticker column, the final step removes those redundant ticker columns with df.drop, leaving only the prefixed price and volume fields.

The end result is a single, chronologically ordered DataFrame indexed by date with distinct, symbol-prefixed columns for adj_close and adj_volume for each instrument. That layout is exactly what you need for simple spread trading: aligned time-series for both legs so you can compute spreads, ratios, or paired signals on a per-date basis.

ring_product = df[”RINGadj_close”] * df[”RINGadj_volume”]
gdx_product = df[”GDXadj_close”] * df[”GDXadj_volume”]

df[”RINGRollingMedian”] = ring_product.rolling(15).median()
df[”GDXRollingMedian”] = gdx_product.rolling(15).median()

df[”Nt”] = df[[”RINGRollingMedian”, “GDXRollingMedian”]].min(axis=1)

df[”RINGRet”] = df[”RINGadj_close”].pct_change()
df[”GDXRet”] = df[”GDXadj_close”].pct_change()

df[”RINGminusGDXRet”] = df[”RINGRet”] - df[”GDXRet”]

df.drop(columns=[”RINGadj_volume”, “GDXadj_volume”, “RINGRollingMedian”, “GDXRollingMedian”], inplace=True)

This block transforms raw price and volume fields into a compact set of features used to drive a simple spread-trading workflow. At the start, the code multiplies adjusted close by adjusted volume for each ticker to produce a dollar-volume-like series for RING and GDX. The intent is to capture recent traded size in currency units rather than just price or raw share counts, because dollar volume better reflects the market capacity and liquidity that will constrain position sizing and execution.

Next, each dollar-volume series is smoothed with a 15-period rolling median, producing RINGRollingMedian and GDXRollingMedian. Using a median over a short rolling window gives a stable, short-term typical dollar-volume level while resisting the influence of single large spikes; this produces a more robust, recent liquidity estimate than a raw or mean-based average would.

The code then computes Nt as the row-wise minimum of those two rolling medians. Conceptually, Nt represents the more constrained liquidity between the two legs of the spread on each date — the smaller dollar-volume tends to limit how large a spread position you can practically take without moving the market, so using the minimum creates a conservative normalization or capacity proxy for downstream sizing and scaling.

Separately, the script converts prices into period returns with pct_change for each ticker (RINGRet and GDXRet). Turning prices into returns removes level effects and makes the two series comparable on a relative basis; returns are the standard input when modeling pairwise or spread relationships because the strategy cares about relative performance rather than absolute price levels.

RINGminusGDXRet is then computed as the difference between the two return series, producing a time series of spread returns (RING minus GDX). This series is the primary signal / P&L driver for a simple spread trade: positive values mean RING outperformed GDX over the period, negative values mean the opposite, and that relative movement is what a spread strategy will attempt to capture or hedge.

Finally, the code drops the raw volume columns and the intermediate rolling-median columns to leave the dataframe focused on the adjusted prices, the two return series, the RING–GDX return spread, and Nt. Those remaining fields are the inputs you would use in subsequent steps of the simple spread-trading process (signal calculation, position sizing using Nt, and backtesting or execution).

first_eight = df.loc[:, ‘2020-01’].iloc[:8]
first_eight

This code pulls out a small, labeled slice of the DataFrame to produce the initial sample used downstream in the spread-trading logic. First, df.loc[:, ‘2020–01’] selects the column whose label is ‘2020–01’ across all rows; because a single column label is provided, that operation yields a pandas Series whose index is the original DataFrame’s row index and whose values are the observations for January 2020. Next, .iloc[:8] is applied to that Series to take the first eight rows by integer position — this is a purely positional slice, so it returns the first eight observations in whatever row order the DataFrame currently has. The assignment stores that eight-element Series in first_eight. Finally, the bare name first_eight in an interactive session causes that Series to be displayed. In the context of simple spread trading, this sequence isolates a compact, time-specific window (the first eight observations from the January 2020 column) that can be used as the initial sample for computing spreads, estimating parameters, or seeding whatever short lookback calculations the strategy requires.

Basic Data Analysis

fig, axis = plt.subplots()
ring_series = df[”RINGadj_close”]
gdx_series = df[”GDXadj_close”]
difference_series = gdx_series - ring_series

for series, label in [
    (ring_series, “iShares MSCI Global Gold Miners ETF”),
    (gdx_series, “VanEck Gold Miners ETF”),
    (difference_series, “Difference GDX - RING”),
]:
    axis.plot(series, label=label)

axis.legend(fontsize=14)
axis.set_xlabel(”Date”, fontsize=14)
axis.set_ylabel(”Price”, fontsize=14)
axis.set_title(”GOLD Miners ETF Series Prices”, fontsize=16)
plt.show()

The block begins by preparing a plotting canvas (a Matplotlib figure and a single axes) and then pulls three time series out of the data frame: the adjusted close for the iShares MSCI Global Gold Miners ETF (RING), the adjusted close for the VanEck Gold Miners ETF (GDX), and a computed spread defined as GDX minus RING. Because these are pandas Series keyed by date, plotting them directly will place the date index on the x‑axis and the numeric price or spread values on the y‑axis.

Next, the code iterates over a small list of (series, label) pairs and calls axis.plot for each pair. This loop places all three lines on the same axes so you can visually compare them: the two absolute price series and their difference (the spread). Plotting the difference series alongside the raw prices lets you immediately see not only the individual price trajectories but also the magnitude and direction of the spread at each point in time, which is central to simple spread trading where trades are based on divergence and convergence between two instruments.

After the lines are drawn, a legend is attached to the axes so each line can be identified by the provided human‑readable labels; the font size is set to keep the labels legible. The x and y axes are labeled “Date” and “Price” respectively, and the chart is given a title that frames the visualization as showing gold‑miners ETF prices. Finally, plt.show() renders the composed figure. Altogether, the code flows from extracting the relevant inputs, computing the spread metric used for pair/spread analysis, plotting all three series together for direct visual comparison, and annotating the plot so a trader can interpret divergence and convergence patterns useful for simple spread trading.

Series Returns

We plot the returns of the two series to better highlight their differences.

import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(16, 8))

plots = [
    (”RINGRet”, “iShares MSCI Global Gold Miners ETF Returns”, “iShares MSCI Global Gold Miners ETF”),
    (”GDXRet”, “VanEck Gold Miners ETF Returns”, “VanEck Gold Miners ETF”),
]

for ax, (col_name, label_text, title_text) in zip(axes, plots):
    ax.plot(df[col_name], label=label_text)
    ax.set_title(title_text)
    ax.tick_params(labelrotation=45)
    ax.set_ylabel(”Daily Return”)

plt.show()
summaryStats(df[[’RINGRet’, ‘GDXRet’]])

The code builds a small diagnostic figure and then prints numeric summaries of the two return series so you can visually and quantitatively inspect the inputs used for a simple spread trading strategy. It begins by allocating a 1x2 Matplotlib canvas (two subplot axes side-by-side) sized to give ample room for time-series detail; this creates two Axes objects that will each host one return series so you can compare them directly.

Next, a short metadata list pairs each DataFrame column name with a human-readable label and a subplot title. This list drives the plotting loop: for each axis and its associated tuple the code draws the column of returns from df onto that axis, passing the descriptive label into plot (so the line carries a legend label internally), then sets the subplot title to the full ETF name. The x-axis tick labels are rotated 45 degrees to keep date/time labels readable when dense, and the y-axis is explicitly labeled “Daily Return” to make clear the plotted quantity. Doing this for both RINGRet and GDXRet side-by-side lets you quickly assess relative magnitudes, timing of spikes, and co-movement behavior — all visual cues relevant to deciding whether and how to form a spread between these two instruments.

plt.show() renders the figure to the screen (or notebook output), completing the visual inspection step. Finally, summaryStats is called with the two return columns extracted from df; that function (as used here) produces numeric summaries of those series — for example central tendency, dispersion, and higher-order moments or other diagnostics — which complements the visual check by quantifying features like mean return, volatility, skewness/kurtosis or other summary measures. Together, the plots plus the summary statistics give you the basic empirical picture needed when constructing a simple spread trade: how the two return streams behave individually and relative to one another.

import seaborn as sns
import numpy as np

plot_obj = sns.jointplot(x=df[’RINGRet’], y=df[’GDXRet’], kind=’reg’)
figure = plot_obj.fig
figure.set_size_inches(12, 8)
plot_obj.set_axis_labels(
    ‘iShares MSCI Global Gold Miners ETF returns’,
    ‘VanEck Gold Miners ETF returns’,
    fontsize=14
)
r2_val = round(np.square(df[’RINGRet’].corr(df[’GDXRet’])), 4)
figure.suptitle(f”Relationship between the GOLD Index Series \n R-Squared ={r2_val}”, fontsize=16)
figure.tight_layout()

This block is centered on visually and numerically characterizing the linear relationship between two return series — RINGRet (iShares MSCI Global Gold Miners ETF returns) and GDXRet (VanEck Gold Miners ETF returns) — which is a core input for simple spread trading decisions. The code first hands those two series to seaborn.jointplot with kind=’reg’, which produces a combined visualization: a scatterplot of paired return observations in the joint axes, a fitted linear regression line with its confidence band, and marginal distributions for each series. Choosing the regression kind both exposes the point-by-point co-movement and overlays the linear fit that you would use when considering a simple spread or hedge relationship between the two instruments.

After creating the jointplot, the code pulls out the underlying Matplotlib Figure (plot_obj.fig) so layout and presentation parameters can be adjusted. The figure size is explicitly set to 12x8 inches to ensure the joint axes, marginal plots, and annotations render at a readable scale for interpretation. Axis labels are then applied directly to the jointplot, giving descriptive, human-readable names for each axis and increasing the font size so the labels are clear when the figure is displayed or saved.

Separately from the plotted regression, the code calculates an R-squared value to quantify the strength of the linear relationship. It does this by taking the Pearson correlation between the two return series and squaring it (r²), which, for a simple linear fit, represents the proportion of variance in one series explained by the other. The result is rounded to four decimal places to produce a compact numeric summary suitable for display alongside the figure.

Finally, that R-squared is injected into the figure-level title (suptitle) so the visual and numeric diagnostics are presented together, and tight_layout() is called to compact and adjust spacing so labels, title, and plot elements do not overlap. Altogether, the block produces a clear graphical-plus-numeric snapshot of how tightly these two miner-ETF return series track each other — the kind of diagnostic you would rely on when evaluating candidate pairs for a simple spread trading strategy.

Trading Strategies

We will experiment with the parameters M, g, j, and s to identify combinations that produce effective trading strategies. This methodology has a look-back bias: it favors parameter settings that worked historically but may appear rogue in a well-functioning strategy. We then filter for strategies that perform robustly without relying heavily on gains from idiosyncratic data points.

Arbitrary starting strategy

We begin with an arbitrary strategy that uses a 15-day rolling mean of returns and threshold-based trade signals: enter a trade when the spread exceeds 0.05% and exit when it falls below 0.02%. These trial parameter values are chosen to demonstrate the strategy in action.

params = dict(M=15, g=0.0005, j=0.0002, s=-0.0005)
strategy_result = executeStrategy(**params)

The two-line block is a focused configuration and execution step for the simple spread-trading routine. First we assemble a parameter set into a single structure named params: M=15, g=0.0005, j=0.0002, and s=-0.0005. In the context of spread trading, these values are the knobs that control how the algorithm measures the spread and how aggressively it reacts. M is the temporal context — the number of observations (a rolling window) used to estimate the spread’s recent behavior (mean, variance, or other statistics). The three small numeric coefficients (g, j, s) are decision thresholds or scaling factors expressed in spread/price units: g and j are small positive values that act as entry, scaling, or filtering thresholds so the strategy only trades when deviations are meaningfully outside short-term noise; s is a small negative value that introduces an asymmetric bias (for example, a negative offset or a negative trigger threshold) so the entry/exit logic treats one side of the spread differently from the other.

After the parameters are collected, executeStrategy(**params) is called to run the trading logic with those named settings. Conceptually, executeStrategy ingests historical or streaming price data, computes the instrument spread, and uses the M window to form the statistical baseline (e.g., rolling mean and volatility or a normalized z-score). The algorithm then compares the current spread signal against the thresholds/coefs g, j and s: those comparisons determine whether the spread deviation is large enough to open a position, whether to scale an existing position, or whether to close it. The positive thresholds (g, j) serve to filter out transient noise and control aggressiveness, while the negative s shifts or asymmetrically biases decision boundaries to reflect a deliberate tilt in the trading rule.

Finally, executeStrategy returns a strategy_result object that encapsulates the outcome of running the strategy with these parameters — typically the executed trades, position history, realized/unrealized P&L, and diagnostic metrics. In short, this code centralizes the configuration for a simple mean-reversion-style spread trader (M defines the lookback used to estimate the spread; g, j, s encode the entry/exit and bias rules), then runs the strategy with those exact settings and yields the resulting performance and trade data.

Stop Loss Strategy

Previously the strategy used a very narrow stop-loss range. Here we widen that range to evaluate whether a broader stop-loss can improve profitability.

params = {
    “M”: 15,
    “g”: 0.0005,
    “j”: 0.0002,
    “s”: -0.01,
    “plot”: True,
    “ylim”: [-0.0025, 0.0025],
}
strategy = executeStrategy(**params)

The two-line block first builds a single configuration object and then passes it into the strategy runner; the dictionary groups the parameters that control every decision the trading logic will make, and executeStrategy is invoked with those named parameters to produce the concrete strategy run (returned into the variable strategy).

M = 15 is the temporal horizon the strategy uses to form its short-term view of the spread. In a simple spread-trading setup this value is used as the lookback length for rolling estimators — for example a rolling mean and/or volatility of the spread — so that the strategy makes decisions based on the recent 15 observations rather than long-term history. Choosing a modest M emphasizes short-lived deviations and makes the strategy react to recent mean-reverting behavior.

g and j are the numerical thresholds that the strategy uses to translate a measured deviation into trading action. In the usual pattern g is the entry threshold (how far the spread must deviate before you open a position) and j is the exit or tighter threshold (how close to mean you require before closing). Both are small positive numbers here (0.0005 and 0.0002), which means the code is configured to look for very small, frequent deviations and to require a smaller gap to trigger exits than entries. The relative sizes of g and j implement the core trading decision: wait for deviation > g to open a position, and wait for reversion inside j to close it, capturing small reversion moves.

s = -0.01 acts as a fixed offset applied to the signal or to the spread baseline; within the narrative of simple spread trading this shifts the center or bias of the trigger logic so that the opening/closing comparisons are made against a slightly offset reference. A negative s here creates a systematic bias in how a computed spread is compared to the thresholds — effectively nudging the strategy toward one side of the market when deciding whether the spread has sufficiently diverged.

plot = True and ylim = [-0.0025, 0.0025] control visualization output from the run. With plot enabled the strategy will produce a diagnostic chart (spread, thresholds, and/or trade markers), and ylim constrains the vertical axis to the specified window so the plot focuses on the small-magnitude spread movements that matter for this parameterization. Finally, executeStrategy(**params) unpacks this configuration into the strategy function: the function consumes M, g, j, s, plot and ylim to compute rolling statistics, apply the threshold/offset decision logic described above, generate trades and performance metrics, and optionally render the focused plot; the result of that computation is returned into the variable strategy.

Finding the best strategy parameters

To identify the best trading strategy, we evaluate different values of `M` (rolling window size), `g` (`enterThreshold`), and `j` (`exitThreshold`). We then list the top 25 parameter combinations for the strategy.

window_lengths = [2, 5, 10, 21]
entry_thresholds = [3e-4, 5e-4, 7e-4, 1e-3]
exit_thresholds = [0.0, 1e-4, 2e-4, 3e-4]

roc_metrics = rocSummary(window_lengths, entry_thresholds, exit_thresholds)

The three lists set up the parameter space for a family of simple ROC-based spread-trading rules and then hand that space to a summarization routine. First, window_lengths = [2, 5, 10, 21] enumerates the lookback horizons over which the code will compute a rate-of-change (ROC) series for the spread. Those four values capture a short to medium set of time scales — two periods for very fast moves, five and ten for intraday/short-term structure, and 21 to approximate roughly a month of daily bars — so the subsequent logic can test how signal responsiveness and persistence change with horizon.

Next, entry_thresholds = [3e-4, 5e-4, 7e-4, 1e-3] supplies the magnitudes of ROC that must be exceeded to open a position. These are small fractional thresholds because spread trading typically targets small mean-reverting deviations rather than large directional moves; each threshold corresponds to the sensitivity at which we consider a deviation large enough to trade. exit_thresholds = [0.0, 1e-4, 2e-4, 3e-4] similarly defines the conditions for closing a trade: small or zero exit values mean we intend to exit close to mean reversion (0.0) or allow a small residual deviation (positive small values) before closing.

The call rocSummary(window_lengths, entry_thresholds, exit_thresholds) is the handoff from parameter definition to analysis. The three lists together describe the rule variants that rocSummary will evaluate: for each combination (or for each aligned triple depending on the function’s internal convention) it computes the spread’s ROC using the specified window length, generates entry signals when the ROC magnitude crosses the associated entry threshold, and generates exit signals when ROC crosses the associated exit threshold. As trades are created and closed, rocSummary aggregates trade-level outcomes into higher-level performance metrics — the returned roc_metrics object captures those summaries (for example counts of trades, realized P&L, and other ROC-driven statistics) so you can compare how different windows and threshold pairs behave.

Putting it back in the overall context of simple spread trading: the code defines a small experimental grid of time horizons and sensitivity thresholds, then runs a systematic evaluation that tells you which ROC window and entry/exit parameterizations produce the desired behavior (quick mean-reversion entries, conservative exits, or whatever performance profile you are targeting). The flow is: declare horizons and rules, pass them to a summarizer that computes ROC, turns ROC into trading signals based on the thresholds, and returns consolidated metrics in roc_metrics for further analysis.

Best-Performing Trading Strategies

# Top 25 trading strategies by Return on Capital
roc_df = rocs
top_25 = roc_df.nlargest(25, ‘Return on Capital’)
top_25

This small block starts with a precomputed table of per-strategy performance metrics (the variable rocs) and gives it a local name roc_df so the code that follows operates on a clearly labeled DataFrame representing “Return on Capital” statistics for each candidate spread or trading strategy. The intention here is to identify the best-performing strategies by the Return on Capital metric — i.e., to rank strategies by how efficiently they convert deployed capital into returns, which is directly relevant when deciding which spreads to prioritize in a simple spread-trading workflow.

The core operation uses pandas’ nlargest to extract the top 25 rows by the ‘Return on Capital’ column. nlargest performs a partial sort to return the highest values for that column, and the call roc_df.nlargest(25, ‘Return on Capital’) produces a DataFrame containing those 25 strategy rows in descending order of Return on Capital. Because the selection is keyed to this single metric, the output isolates the strategies that deliver the most return per unit of capital — a useful shortlist when you want to concentrate execution, monitoring, or further risk checks on the most capital-efficient spreads.

Finally, the expression top_25 is left as the last line so that, in an interactive environment, the resulting DataFrame is emitted for inspection. That DataFrame retains the original row indices and all other columns from roc_df, but only includes the 25 strategies ranked highest by Return on Capital, ready to be used by downstream steps in the simple spread-trading pipeline (for example, allocation, backtesting, or live execution).

Observations

Of the 25 trading strategies, the top 17 (best-performing) are executed using a two-day rolling window.

Next, we will analyze one of the top strategies.

rocs.pipe(lambda df: df.sort_values(’Return on Capital’, ascending=False)).groupby(’Rolling Window’).head(3)

This one-line chain takes the rocs DataFrame and turns it into a compact list of the highest-performing candidates per evaluation period. First, the DataFrame is passed into a lambda via pipe that sorts every row by the numeric column ‘Return on Capital’ in descending order, so the records with the largest capital returns appear first. Because we sort globally by that metric before grouping, each group’s rows will also appear in descending ROC when they are encountered in the frame. Next, the code groups the now-ordered rows by the ‘Rolling Window’ column and calls head(3) on those groups, which yields the first three rows for each rolling-window group as they appear in the sorted DataFrame (i.e., up to three instruments with the highest ROC for that window). The net result is a DataFrame filtered down to the top three ROC-ranked instruments within every rolling window — a concise set of candidates you can use when constructing simple spread trades based on recent return-on-capital performance.

Shorter rolling-window strategies outperform longer-window strategies. Moreover, the top-performing approaches typically use closely related values of g and j, so manual filtering of these parameter combinations may be necessary to generate realistic trading ideas.

Evaluating the Best-Performing Strategies

Two-Day Rolling Window Average Strategies

We begin with the best-performing approach. It uses a two-day rolling average and applies relatively low thresholds: g = 0.0003 and j = 0.

_params = dict(M=2, g=0.0003, j=0, s=-0.01, plot=True, ylim=[-0.01,0.01])
strategy_result = executeStrategy(**_params)

This code builds a small configuration for a simple spread-trading strategy and immediately runs that strategy with those settings. The dict packs the strategy controls — how the spread is measured and turned into trade signals, an optional visualization flag, and the vertical bounds for that visualization — and then executeStrategy is invoked with these parameters unpacked as keyword arguments. The return value, strategy_result, captures whatever the strategy routine produces (trade list, P&L, diagnostics, and/or plots) for later inspection.

Breaking down the parameters and why they matter for simple spread trading: M=2 sets the temporal scale used inside the strategy for short-term reference (for example, the length of a local mean or lookback window). A very small M biases the logic to react to very recent changes in the spread rather than long-term structure, which is appropriate when you want quick mean-reversion trades on transient deviations. g=0.0003 is a scaling factor applied to the raw signal — it determines the sensitivity or aggressiveness of position sizing relative to the measured deviation; a small g produces smaller position sizes or weaker signal strength for a given spread move. j=0 is an additive adjustment applied before thresholds or other rules; with j set to zero, there is no offset or bias introduced by this parameter. s=-0.01 is the signed threshold or target level used to trigger trading decisions: a negative s indicates that the strategy treats a spread below −0.01 as the condition to act (enter, exit, or flip positions depending on the internal rule set). plot=True instructs the strategy to produce a visual output of the spread, signals, and/or trades, and ylim=[-0.01,0.01] constrains the vertical plotting range so that the chart focuses tightly on the small-magnitude spread dynamics relevant to these parameter values.

Flow inside executeStrategy (how these parameters drive decisions): the routine ingests price data for the two legs, computes the instantaneous spread series, and computes the short reference statistic using M (e.g., a moving average or recent window). It then forms a trading signal by comparing the current spread to that reference, typically scaling the raw deviation by g and applying additive bias j; this normalized/scaled signal is compared to the threshold s to decide whether to open, close, or size a position. Because M is small and g is small while s is negative, the logic will respond quickly to small negative excursions of the spread but will size trades conservatively. If plot is True, the routine renders the spread and annotated signals/trades within the provided ylim so the plotted activity concentrates on the ±0.01 range of interest.

Finally, strategy_result captures the outcome: the executed trades, position time series, and performance metrics produced by executeStrategy under these inputs (and, because plot=True, the routine will also have produced the corresponding visualization constrained by ylim). The two-line block thus defines a tight, low-latency, low-aggression spread-trading run and then executes it, returning the run’s results for analysis.

Next, we attempt to keep the intermediate thresholds for g and j at 0.002 and 0.0005, respectively.

_parameters = {”M”: 2, “g”: 0.002, “j”: 0.0005, “s”: -0.01, “plot”: True, “ylim”: [-0.01, 0.01]}
strategy = executeStrategy(**_parameters)

This code builds a concise configuration for a simple spread-trading run and then hands that configuration to the strategy runner. First, a single dictionary _parameters collects all the knobs that control the trading logic and the output of the run; those entries are then expanded as keyword arguments into executeStrategy, and its return value is stored in strategy so the rest of the program can inspect positions, P&L and diagnostics.

Each parameter in the dictionary directly maps to a specific role in the spread-trading workflow. M=2 selects the model order or the number of observations/assets the strategy uses to form the spread estimate — in other words, it determines how the spread is constructed or how much recent history the estimator uses. g=0.002 is a small gain or step-size that controls how aggressively the algorithm adjusts position size in response to deviations of the spread from its reference level; keeping g small produces incremental position updates rather than large jumps. j=0.0005 represents a small per-trade friction or penalty (for example transaction cost, slippage, or a regularization term) that is applied when the strategy computes net expected benefit from trading; this term reduces sensitivity to very small or marginal signals. s=-0.01 is the spread threshold or setpoint that the strategy uses to decide when to enter (or exit) trades — with a negative value here the strategy is configured to act when the spread crosses below −1%, so signals are gated by a meaningful magnitude. plot=True turns on post-run visualization so you can inspect the spread, positions and P&L, and ylim=[-0.01, 0.01] constrains the vertical range of those plots to ±1% so the visual focus stays centered on the small spread deviations that matter for this strategy.

When executeStrategy receives these keyword arguments it uses them to configure its internal pipeline: the estimator that computes the spread (driven by M), the decision rule that compares the spread to s, the position-update rule that scales changes by g, and the cost model that incorporates j into the profitability calculation. As the strategy runs through market data, it repeatedly computes the measured spread, compares it to the threshold s to generate entry/exit signals, updates positions incrementally according to g while accounting for the friction j, and (because plot is True) produces plots whose vertical axes are limited by ylim. The returned strategy object therefore encapsulates the run configured by those parameters — the realized trades, state evolution, and diagnostic plots for the simple spread-trading experiment.

To further reduce trade frequency, we test substantially higher entry and exit thresholds for the two-day rolling-average strategy.

m_count = 2
growth = 0.004
jump = 0.002
shift = -0.01
do_plot = True
y_bounds = [-0.01, 0.01]

params = dict(M=m_count, g=growth, j=jump, s=shift, plot=do_plot, ylim=y_bounds)
strategy = executeStrategy(**params)

This snippet is the orchestration piece that configures and launches a compact spread-trading run. At the top you set the trading hyperparameters that determine how the strategy will interpret price differences and when it will act: m_count = 2 establishes the number of legs (M) the strategy will treat as the spread — in a simple spread-trading context that typically means working with a pair of instruments or two components of a constructed spread. growth = 0.004 and jump = 0.002 are sensitivity thresholds used to distinguish between slower, persistent movement (growth) and faster, transient moves (jump); the code passes both so the strategy can apply different logic or thresholds for entries/exits depending on whether the spread is drifting or exhibiting a sharp excursion. shift = -0.01 is an additive bias applied to the spread or signal baseline; by shifting the signal you change the level around which mean-reversion or entry thresholds are evaluated (often used to offset costs, capture an intended bias, or align with a desired reference point). do_plot = True enables on-run visualization; y_bounds = [-0.01, 0.01] defines the vertical limits for those plots so the visual output focuses on the relevant range of spread values or signals.

Those values are then consolidated into a single parameter mapping (params) keyed with the names the strategy expects: M, g, j, s, plot, and ylim. Passing them as a grouped set keeps the handoff explicit and makes it clear which conceptual parameter each numeric value corresponds to. executeStrategy(**params) is the runtime invocation: the function receives the configuration and executes the core spread-trading logic using those inputs. Internally, executeStrategy will take M to shape how it constructs and computes the spread, apply the g and j thresholds when forming trading signals or deciding between drift-driven versus jump-driven trade rules, and add s as the offset used when evaluating entries and exits. If plotting is enabled, the function will render the strategy behavior with the vertical scale constrained by ylim so the visualization highlights the actionable range.

Finally, the result of that run is assigned to the variable strategy, which serves as the handle to whatever executeStrategy returns — typically an object or structure containing the generated signals, executed trades, P&L traces, and any diagnostic information or plots. In summary, the snippet builds a small, explicit configuration for a focused spread-trading experiment (two-leg spread with separate drift and jump sensitivity, an intentional signal bias, and visualization constraints) and immediately executes it, capturing the run output for further inspection.

Create a DataFrame to track plausible strategies.

import pandas as pd

strategies_df = pd.DataFrame(
    columns=[’Rolling Window’, ‘Enter Threshold’, ‘Exit Threshold’, ‘Stop Loss’, ‘Return on Capital’]
)
strategies_df.index.name = ‘Strategy #’

row = dict(zip(strategies_df.columns, [2, 0.004, 0.002, -0.01, strategy[1]]))
strategies_df.loc[’2_Day_Strategy’] = pd.Series(row)

The code begins by constructing an empty DataFrame whose columns explicitly capture the parameter set needed to define a single spread-trading strategy: a rolling window size, entry and exit thresholds, a stop-loss threshold, and a “Return on Capital” metric. These columns represent the core knobs for a simple spread strategy — the window controls the lookback used to compute spread statistics, the thresholds govern when the system opens and closes positions, the stop loss bounds downside risk, and the return metric records performance for comparison across strategies.

Immediately after creating the frame, the index is given the name “Strategy #”. This is a labeling decision so that each row can be identified as a distinct named strategy; the DataFrame is thus structured to hold multiple strategy configurations indexed by a human-readable key.

Next, a single strategy configuration is assembled by pairing the column names with concrete values using zip, producing a dictionary that maps each column to the intended value for this strategy. The numeric values encode the trading rules: a 2-day rolling window (so short-term spread behavior is used), an enter threshold of 0.004 and an exit threshold of 0.002 which define the band for opening and closing trades (enter at larger deviation, exit when the deviation has reduced), and a stop loss at -0.01 to limit losses if the spread moves adversely. The final value, strategy[1], is taken from surrounding context and placed into the “Return on Capital” column so that the strategy row carries its realized or expected performance metric alongside its rule set.

Finally, that dictionary is turned into a pandas Series and assigned into the DataFrame under the index label ‘2_Day_Strategy’. This inserts a labeled row representing the full configuration and its return metric into the strategies table, so downstream code can iterate over the DataFrame to backtest, compare, or deploy the named strategy within the simple spread-trading workflow.

5-Day Rolling-Window Average Strategies

params = dict(M=5, g=0.0005, j=0.0002, s=-0.01, plot=True, ylim=[-0.01, 0.01])
strategy = executeStrategy(**params)

The first line constructs a single configuration object — params — that captures every knob the spread strategy needs to run. Treat this dictionary as the strategy’s runtime specification: M, g, j, and s are numeric hyperparameters that control how the strategy measures and reacts to deviations in the spread, plot is a boolean that turns on visual diagnostics, and ylim provides the y‑axis limits used for those diagnostics. Packaging everything into one dict makes the subsequent call concise and makes it explicit which values are being passed into the trading routine.

When executeStrategy is invoked with **params, each key in the dictionary is passed as a named argument into that function. M = 5 sets the lookback window for the strategy’s short-term reference — in a simple spread-trading context this is the horizon used to form the moving reference (for example a rolling mean or short-term estimator of the spread), which smooths noise and produces a stable baseline against which deviations are judged. The two small positive scalars g = 0.0005 and j = 0.0002 are control parameters used by the entry/exit and sizing logic: they scale or threshold the measured deviation of the spread so the strategy does not react to micro-noise but only to economically meaningful moves. s = -0.01 is the explicit spread trigger/offset — expressed in the same normalized units as the spread — and represents the spread level at which the strategy will start to take directional action (here, a negative value indicates the strategy is looking for the spread to move below a −1% reference before executing the corresponding trades).

Flow-wise, executeStrategy consumes these values and runs the classic spread-trading sequence: compute the current spread and its short-term reference using M, measure the deviation of the spread from that reference, and then compare that deviation against the scaled thresholds (g and j) and the trigger s to generate trade signals. Those signals determine opening, sizing and closing of simple spread positions according to the strategy’s internal rules; the small magnitudes of g and j indicate fine-grained control over sensitivity and position scaling, while s encodes the directional bias/trigger level. If plot is True, the routine also produces visual diagnostics showing the spread, reference, thresholds and trades, and ylim = [-0.01, 0.01] constrains the vertical range of those plots so the visualization is centered on the decision-relevant band around ±1%.

Finally, executeStrategy returns the concrete strategy result (trades, P&L series, metrics and any plots), which is stored in the variable strategy for later inspection or analysis. In short: this code block builds a tuned configuration for a short-window, threshold-driven spread trader, runs it, and captures the resulting strategy object and visual diagnostics.

_params = {”M”: 5, “g”: 0.002, “j”: 0.0003, “s”: -0.01, “plot”: True, “ylim”: [-0.005, 0.005]}
_result = executeStrategy(**_params)
strategy = _result

The two-line block is assembling the strategy hyperparameters and then running the strategy with those parameters; the final returned object is stored as strategy for downstream inspection. Conceptually, this is the orchestration layer: we pick the numeric knobs that govern signal construction, execution behavior, and visualization, hand them to executeStrategy, and keep the full result for analysis.

M = 5 sets the temporal window used inside the strategy to form the spread statistic or its smoother. In a simple spread-trading design you rarely trade on raw tick-by-tick divergence, so a small lookback like 5 samples is used to average or filter the instantaneous spread, reducing noise and ensuring decisions are based on short-term persistent dislocations rather than transient jitter. That smoothing directly affects signal latency and the frequency of trades.

g = 0.002 and j = 0.0003 are the control thresholds that drive the trade decision logic. The larger value (g) serves as the entry trigger magnitude: when the processed spread moves beyond ±g the strategy interprets that as a sufficiently large mispricing to open a position. The smaller value (j) serves as the exit or reversion threshold: positions are closed when the spread has returned within ±j. Using two different thresholds implements hysteresis — it prevents rapid flip-flopping by requiring a larger move to enter than to exit, which stabilizes the trade flow and reduces churning.

s = -0.01 is a constant shift applied in the decision or P&L model inside the strategy. In practice this parameter biases the effective spread or adjusts signals to account for expected transaction costs, skew, or a persistent drift; a negative value shifts the trigger plane or profit targets downward, changing the frequency and directionality of trades in a predictable way. This parameter alters signal thresholds or reported performance so that the strategy’s internal logic reflects that offset when it evaluates entry/exit conditions and P&L.

plot = True and ylim = [-0.005, 0.005] control visualization. With plot enabled, executeStrategy produces graphical output that overlays the filtered spread, the thresholds, and trade markers so you can visually validate when and why trades were taken. ylim constrains the vertical range on that plot to the specified band around zero, focusing the view on the region where spread dynamics and trade signals are concentrated and making small but meaningful deviations easy to inspect.

When executeStrategy(**_params) is called, those parameters are passed as keyword arguments. Inside that function (as expected for a simple spread trading implementation), the data flow is: compute or receive the raw spread series, apply the M-sample smoothing to produce the signal series, apply the s offset and then compare the adjusted signal against the thresholds g (for entry) and j (for exit) to generate discrete trade signals, simulate execution and track P&L over time, and — since plotting is enabled — render the spread and trades using the given ylim. The function returns a comprehensive result object capturing the time series, generated signals, executed trades, and performance metrics; that returned structure is first assigned to _result and then aliased to strategy for use by subsequent analysis or reporting.

_row_payload = (5, 0.002, 0.0003, -0.01, strategy[1])
plausableStrategies.loc[’5_Day_Strategy’] = list(_row_payload)

This block builds and registers a concrete parameter set for a named trading strategy into the shared strategies table used by the spread-trading system. First, _row_payload is created as an ordered collection of five values that represent the configuration for this strategy: the first value (5) is the lookback or holding window length (hence the label “5_Day_Strategy”), the next two small positive decimals (0.002 and 0.0003) are quantitative thresholds that will be used by the trading logic (for example an entry trigger and a transaction/slippage or margin cost expressed in fractional terms), the fourth value (-0.01) is a signed threshold used to represent an adverse movement limit such as a stop-loss or exit threshold (here a 1% downside), and the final element strategy[1] is a reference to the specific strategy implementation or identifier that defines how signals are computed for these parameter values. Together these elements package both numeric risk/control parameters and the pointer to the algorithm that will consume them.

Next, that payload is converted to a list and assigned into the pluasableStrategies DataFrame at the index label ‘5_Day_Strategy’ via .loc. Converting to a list ensures the sequence is supplied in a form pandas expects for row assignment so each element maps to the DataFrame’s columns in order. Using the string key ‘5_Day_Strategy’ registers this configuration under a human-readable identifier, so downstream components (the backtester or live executor) can look up the row by name, extract the five columns in the expected order, and apply them when generating signals, computing transaction costs, and enforcing exits for the simple spread-trading strategy. The net effect is that this code both defines the numeric behavior for a particular 5-day spread strategy and stores it in the central table where the trading engine will retrieve it.

10-Day Rolling Window Average Strategies

options = {
    “M”: 5 * 2,
    “g”: 5e-4,
    “j”: 1e-4,
    “s”: -1e-2,
    “plot”: True,
    “ylim”: [-0.004, 0.004],
}
strategy = executeStrategy(**options)

This snippet builds a small configuration dictionary and then launches the trading routine with it. The dictionary entries are the knobs that control how the simple spread-trading strategy runs and what it outputs; those knobs are then expanded into keyword arguments when calling executeStrategy.

First, the numeric parameters. “M” is set to 5 * 2, which evaluates to 10, and acts as the strategy’s batch/lookback size or memory length used when estimating the current spread statistics (for example, a rolling-estimate window or number of recent observations used to compute a mean, variance, or regression). A relatively small M (10) makes the estimator responsive to recent price behavior, so the spread estimate tracks short-term changes rather than long-term averages. The two small positive values “g” = 5e-4 (0.0005) and “j” = 1e-4 (0.0001) are step-size-like parameters: g is the primary update rate used when the strategy adapts an online parameter (such as a hedge ratio, filter coefficient, or a running mean), and j is a secondary, smaller rate used for a secondary update or regularization term. Choosing small g and j enforces gradual updates so the strategy doesn’t overreact to single observations; g controls the main adaptation speed, while j supplies a slower correction or smoothing component.

The parameter “s” = -1e-2 (-0.01) is the signal threshold or spread setpoint around which trading decisions are made. Because it is negative, the code will treat deviations below zero (i.e., the spread crossing this negative threshold) as the condition to take an asymmetric trading action — opening or closing position(s) depending on the strategy’s sign convention. In the usual simple spread-trading pattern, the algorithm computes a spread (residual between two instruments or between a price and its hedge) and compares it to s: when the spread is sufficiently below s the strategy will enter a trade expected to profit as the spread mean-reverts back toward zero or the estimated center; when the spread reverts above some exit rule the strategy will close the trade.

The plotting controls are straightforward: “plot”: True tells executeStrategy to produce a visual output for inspection, and “ylim”: [-0.004, 0.004] constrains the y-axis range of that plot so the visualization is focused on very small spread movements. Showing the plot with a narrow vertical range helps the trader visually assess the small deviations the strategy is trying to capture and whether the spread dynamics stay within the target band.

Finally, when the code calls strategy = executeStrategy(**options), the options dict is unpacked into executeStrategy’s parameters. Internally executeStrategy will take M, g, j, s, plot and ylim, initialize any internal state (buffers of length M, initial coefficient estimates, position and P&L trackers), iterate over price data to compute the spread at each time step, update estimates using g and j, generate entry/exit signals based on comparisons with s, record performance, and — because plot is True — render the spread and possibly markers for trades using the specified ylim. The returned object assigned to strategy thus encapsulates the run’s results (final parameters, trade log, performance metrics and/or figures) for this particular simple spread-trading configuration.

To limit the number of trades, we widen the spreads for the 10-day window.

params = {”M”: 10, “g”: 0.0015, “j”: 0.0005, “s”: -0.01, “plot”: True, “ylim”: [-0.004, 0.004]}
strategy = executeStrategy(**params)

These two lines assemble a set of configuration parameters for a spread-trading run and then execute the trading logic with those parameters, capturing the returned strategy object. The first line builds a dictionary named params that encodes the runtime settings the trading function will use: “M” is set to 10, “g” to 0.0015, “j” to 0.0005, “s” to -0.01, “plot” to True, and “ylim” to [-0.004, 0.004]. The second line calls executeStrategy(**params), which unpacks that dictionary into keyword arguments and hands them to the strategy routine; the result is assigned to the variable strategy for downstream inspection or use.

Conceptually, these parameters drive the behavior of a simple spread-trading strategy. M = 10 selects the length of the lookback window or the number of observations used when estimating the current spread statistic or its moments; choosing a small M makes the strategy responsive to recent changes in the leg relationship. The two small numeric thresholds g = 0.0015 and j = 0.0005 represent control parameters for signal generation and state transitions: in a mean-reversion spread framework g typically functions as the trigger level for opening positions (a larger deviation required to initiate a trade), while j represents a finer threshold used for closing or damping trades (a smaller band that determines when the spread has reverted sufficiently). The s = -0.01 value injects a signed offset or bias into the spread logic — serving as a fixed shift or signed target that alters the sign or magnitude at which the system treats the spread as favorable; its negative value biases decisions in the corresponding direction. The boolean plot = True tells the routine to produce visual output of spreads, signals, and positions, and ylim = [-0.004, 0.004] constrains the y-axis of those plots so the visualization is focused on the small range where meaningful spread movements occur.

Inside executeStrategy, these inputs flow into the standard spread-trading pipeline: the function uses M to compute rolling statistics for the spread (mean, variance, or other estimators), compares the current spread against thresholds derived from g and j (potentially adjusted by s) to decide whether to enter, exit, or hold a position, and then simulates the resulting trades to produce position traces, cash/P&L time series, and summary performance metrics. Because plot is True, the routine also generates plots of the spread and the trade overlays, and it applies ylim to keep the visual range focused on the small-magnitude spread dynamics. The returned strategy object therefore encapsulates the signals, executed positions, performance results, and any plotted artifacts, giving you a complete record of this particular parameterized run of the simple spread-trading approach.

entry = [10, 0.0015, 0.0005, -0.01, strategy[1]]
df_ref = plausableStrategies
df_ref.loc[’10_Day_Strategy’] = entry

This block constructs a single strategy specification and registers it in the shared collection of plausible strategies so downstream trading routines can pick it up for evaluation.

First, the list assigned to entry is a compact encoding of the strategy’s parameters: the first element (10) is the lookback or period that gives the strategy its name, the next two numeric values (0.0015 and 0.0005) represent the positive thresholds used to open and close the spread trade (an entry threshold and a narrower exit threshold), the negative value (-0.01) is an explicit negative offset or boundary used by the strategy (for example to represent a stop-loss or lower bound on acceptable spread), and the final element strategy[1] pulls an existing value from the pre-existing strategy sequence to carry through some contextual or categorical parameter from another strategy definition. Together these five values fully describe the strategy’s behavior for the simple spread-trading logic elsewhere in the system.

Next, df_ref = plausableStrategies creates a local reference to the DataFrame that holds all candidate strategies. The subsequent assignment df_ref.loc[‘10_Day_Strategy’] = entry inserts that list as a row labeled ‘10_Day_Strategy’ into the DataFrame. In practice this adds (or overwrites) an index entry with that name and populates the DataFrame’s columns in order with the five parameter values; because df_ref is a reference to the original plausableStrategies object, the original collection is updated in place so any later code that iterates over plausableStrategies will see the newly registered 10_Day_Strategy and can run backtests or live decision logic using these specific parameters.

params = {
    “M”: 21,
    “g”: 0.0015,
    “j”: 0.0005,
    “s”: -0.01,
    “plot”: True,
    “ylim”: [-0.004, 0.004],
}
strategy_result = executeStrategy(**params)
strategy = strategy_result

This code block configures and launches a simple spread trading strategy by assembling a small set of hyperparameters, handing them into the strategy runner, and capturing the result for downstream use. The central object is the params mapping: it encodes the numerical and presentation parameters that control how the spread is measured, when the system trades, and how the output is visualized. These values reflect the strategy’s operational choices rather than implementation details — M = 21 establishes the look-back horizon used to form the baseline or reference for the spread (for example a rolling mean or other statistics computed over 21 periods), which smooths short-term noise and creates a stable expectation against which deviations are judged. The three small numeric parameters g (0.0015), j (0.0005) and s (-0.01) are the trading thresholds and offsets the algorithm uses to make discrete decisions: g and j represent scaled thresholds that determine when a measured deviation in the spread is large enough to trigger entry, scaling, or exit logic (one is a primary threshold, the other a finer control used in subsequent trade-management rules), while s is a signed offset-level used as a fixed bias or a loss-exit level in the spread domain. These thresholds are deliberately small because spread values are typically small relative to raw price levels; their purpose is to prevent acting on routine noise and to codify when mean-reversion (or divergence) is economically meaningful. The plot flag indicates that the strategy runner should generate diagnostic visual output, and the ylim array constrains the vertical range of those plots so that plotted spread residuals, thresholds, and signals are presented on a consistent scale ([-0.004, 0.004]).

When executeStrategy(**params) is invoked, the params mapping is unpacked into keyword arguments and handed into the strategy implementation. The strategy function then takes the provided look-back and thresholds and applies them to the spread time series: it computes the reference statistic over the last M periods, measures the current deviation from that reference, compares that deviation against g and j to decide whether to open, scale, or close positions, and uses s where a fixed offset or exit condition is required by the trading logic. Throughout this flow the function will also assemble performance and state information (for example positions, realized and unrealized P&L, timestamps of signals, and any intermediate diagnostic series like the rolling mean, spread residuals, and threshold crossings) so that the behavior and outcomes of the strategy can be inspected.

Because plot is True, the strategy routine additionally produces plots of the spread and related series; those plots are constrained vertically by ylim so that the spread residuals, threshold lines, and trade markers remain within a focused visual window. Finally, the return value of executeStrategy — the strategy’s output bundle containing the performance metrics, signal history, and any plot objects or auxiliary series — is stored in strategy_result and then aliased to strategy. From this point forward the caller has a single reference (strategy) to access the run’s results for evaluation, reporting, or downstream processing in the context of simple spread trading.

# narrow spread configuration
conf = dict(M=21, g=0.0005, j=0.0001, s=-0.01, plot=True, ylim=[-0.004, 0.004])
narrow_spread_strategy = executeStrategy(**conf)

This block defines a compact configuration for a “narrow spread” trading run and then executes the strategy with those settings. The first line builds a configuration dictionary that codifies the parameters the strategy engine will use to measure the spread, generate signals, and visualize the outcome; the second line unpacks that dictionary into executeStrategy and captures whatever the execution returns into the variable narrow_spread_strategy.

Sequence and data flow: when executeStrategy is called with these named parameters, the strategy engine will read them and use them to process the incoming price/spread time series. A lookback length M=21 is provided to compute reference statistics (for example a running mean or volatility estimate) over a medium-term window — this anchors the spread’s baseline and volatility estimate so subsequent decisions are comparable across time. The two small numeric parameters g=0.0005 and j=0.0001 act as sensitivity scalars for entry/exit logic: because spreads are typically very small in absolute terms, these tiny multipliers scale thresholds relative to the baseline volatility or mean so the engine makes fine-grained decisions (g typically controls the primary trigger level and j a secondary, tighter/looser threshold used in complementary logic such as filtering, confirmation, or exit). The parameter s=-0.01 is provided as a signed offset or bias applied to the decision rule (for example shifting the entry/exit threshold or imposing a directional preference); setting it negative biases the trigger in the negative direction, which changes how the computed spread deviations are interpreted when deciding whether to open or close a position.

The plot=True flag tells the strategy engine to produce visual output during or after the execution: the engine will render charts showing the spread, the reference line(s) computed with M, and mark the points where g/j/s caused trade signals. The ylim=[-0.004, 0.004] parameter constrains the vertical range of those plots to a tight band around zero; that explicit limit focuses the visualization on the narrow-magnitude fluctuations that matter for a narrow-spread approach, making entry/exit events visible relative to the small-scale spread movements rather than dominated by larger multi-day swings.

The final result of executeStrategy(**conf) — assigned to narrow_spread_strategy — is the strategy’s output object (trade signals, executed trades, P&L series, and any diagnostic/plot artifacts depending on the implementation). In short: this code packages a narrow-spread configuration (medium lookback, very small sensitivity scalars, a negative bias) and runs the strategy engine with plotting enabled and a tightly focused y-axis so that the produced signals and performance reflect the intended “capture small mean-reversion moves in the spread” objective.

Conclusions

- Limiting the number of trades to approximately 70 over a two-year trading period yields an ROC of roughly 12–13% across the evaluated rolling-window sizes (2, 5, and 10 days).

- A 2-day rolling window is highly volatile and therefore requires much higher thresholds; despite this, it remains profitable because mean-reversion opportunities can produce large returns.

- A 10-day rolling window is less volatile, allowing profitable trading with lower thresholds.

- With very low trading costs, returns can increase substantially by taking more trades with a shorter window; for example, a 2-day window with narrow trading bands can raise ROC to around 17–18%.

- With very high trading costs, using longer windows and higher thresholds can still deliver acceptable returns (around 12% ROC for a 10-day window).

- Strategies using substantially longer windows (around 21 days) show poor returns, since the series tend to mean-revert over shorter horizons.

We therefore consider the following trading strategies to be good:

Comparison with Fama–French Style Factors

Next, we compare the returns of the three shortlisted trading strategies with the Fama–French style factors. We begin by importing the Fama–French three-factor returns and computing each strategy’s returns from cumulative P&L.

from pandas_datareader import data as pdr

_dataset = “F-F_Research_Data_Factors_daily”
_source = “famafrench”
_date = “2019-12-02”

_bundle = pdr.DataReader(_dataset, _source, _date)
ff_factors = _bundle.get(0).div(100)
ff_factors.tail()

This block fetches the Fama–French daily factor table, selects the numeric factor panel, converts those numbers into decimal return units, and then displays the last few rows so you can confirm the loaded values. Concretely, the call to the data reader with the specified dataset name, source, and date asks the Famafrench data service for the daily factor dataset as of the given vintage; DataReader returns a small “bundle” (a dict-like container) that packages the multiple pieces of the published dataset. The code then calls get(0) on that bundle to extract the primary table of time series — the DataFrame whose rows are trading dates and whose columns are the published factor series (for the standard Fama–French daily set these will include Mkt-RF, SMB, HML, RF, etc.). Dividing the DataFrame by 100 transforms the published values from percentage points into decimal fractions (e.g., 1.23 -> 0.0123), which aligns the factors with the usual numeric convention for returns and makes them directly usable in arithmetic for constructing spreads, computing excess returns, and aggregating P&L. Finally, tail() is invoked to show the most recent rows, letting you visually verify the date index and the converted decimal values before you use these series to build simple long–short spreads or other spread-trading calculations.

strategyReturn = ff_rtn.copy(deep=True)

This line creates an independent working dataset for the rest of the strategy code by making a deep copy of the input ff_rtn. In practice ff_rtn is the input table of returns (the base returns for the instruments or factors you will use to build spreads), and by assigning strategyReturn = ff_rtn.copy(deep=True) the code produces a new DataFrame that contains the same values, index, dtypes and internal blocks as the original but stored separately in memory. The “how” is pandas’ deep copy semantics: the copy duplicates the underlying data buffers as well as the axis labels, so subsequent in-place changes to strategyReturn (adding columns, rolling computations, normalizing, combining legs into a spread, or computing P&L) will not alter ff_rtn.

The “why” is to preserve the canonical input while giving the strategy code a mutable workspace. For a simple spread trading flow, you typically need to derive transformed series (normalized leg returns, spread returns, position signals, and accumulated strategy returns) from the raw returns; making a deep copy up front ensures those derivations can be applied destructively and iteratively on strategyReturn without risking accidental modification of the source data that might be reused for diagnostics, benchmarking, or further analysis. In short, this line marks the start of the processing pipeline by producing an isolated dataset that subsequent spread construction and P&L computations operate on.

for idx, row in plausableStrategies.iterrows():
    window_size = int(row[’Rolling Window’])
    enter_thresh = row[’Enter Threshold’]
    exit_thresh = row[’Exit Threshold’]
    stop_loss = row[’Stop Loss’]

    df_result, _ = executeStrategy(window_size, enter_thresh, exit_thresh, stop_loss, plot=False)
    strategyReturn[idx] = df_result.cumPnL.div(df_result.cumPnL.shift()) - 1

strategyReturn = strategyReturn[’2020-1’:]

This loop iterates over each candidate parameter set in plausableStrategies and runs a full simulation for that set, then converts the simulation’s cumulative profit-and-loss into a period-by-period return series and stores it under the corresponding strategy index. For each row, the code reads the four configuration values that define a single simple spread trading strategy: the rolling window length that governs how the spread statistic is computed, the entry and exit thresholds that determine when to open and close positions, and the stop-loss level that caps downside exposure. Those parameters are passed into executeStrategy so the simulator executes exactly the defined trading rules for that parameter combination.

executeStrategy is invoked with the window size, entry/exit thresholds, and stop loss; it returns a results DataFrame (df_result) and a second value that this code discards. The df_result holds the time series produced by the simulation, and the code uses its cumPnL column — the running cumulative profit-and-loss — as the baseline performance measure for the strategy. Converting cumulative PnL into period returns makes the performance comparable across strategies and time: the line df_result.cumPnL.div(df_result.cumPnL.shift()) — 1 computes the growth factor from one timestamp to the next (current cumPnL divided by the prior cumPnL) and subtracts one to express it as a percentage change for each period.

That resulting return series is assigned into strategyReturn keyed by the current strategy index (idx), so after the loop completes strategyReturn contains one return time series per tested parameter set. Finally, the code restricts strategyReturn to timestamps from January 2020 onward (strategyReturn = strategyReturn[‘2020–1’:]), focusing subsequent analysis on the post-2020 portion of each strategy’s return history. The overall flow, therefore, is: take each parameter combination, simulate trading to produce cumulative PnL, transform that cumulative series into period-over-period returns, store those returns by strategy, and then limit the dataset to the 2020+ window for downstream comparison or aggregation.

import seaborn as sns
import matplotlib.pyplot as plt

corr_matrix = strategyReturn.corr()
ax = sns.heatmap(corr_matrix, annot=True, cmap=’Blues’)
ax.set_title(”Correlation between Fama-French factors and Quant Trading strategy returns”, fontsize=16)
plt.show()

The snippet begins by taking a DataFrame named strategyReturn and computing its pairwise correlation matrix with strategyReturn.corr(). This produces a square matrix where each cell is the Pearson correlation coefficient between two columns (for example, between each Fama–French factor and the strategy return). The diagonal entries are 1.0 (each series perfectly correlates with itself), and the off‑diagonal entries quantify linear relationships, which is important for assessing how much the simple spread trading strategy is exposed to common factor movements.

That correlation matrix is passed into seaborn’s heatmap function to create a visual representation. seaborn maps the numeric correlation values to colors so you can immediately see which pairs are strongly positively correlated, strongly negatively correlated, or near zero; the spatial layout (rows and columns corresponding to the original series) makes it easy to spot clusters of related variables. The annot=True argument overlays the actual numeric correlation values on each cell so the chart provides both visual and precise quantitative information.

Using cmap=’Blues’ chooses a single-hue color palette where intensity encodes magnitude; this makes the interpretation consistent (darker blue = stronger correlation magnitude in the positive direction for this palette) and keeps the visual emphasis on relative strengths across the matrix. The returned Axes object is captured in ax so the code can set a descriptive title on the same plot, linking the visualization back to the business context: “Correlation between Fama-French factors and Quant Trading strategy returns.”

Finally, plt.show() triggers rendering of the composed figure so the heatmap appears in the active display. Together, these steps turn the raw correlation numbers into an annotated, labeled visual that helps you evaluate how the spread trading strategy co-moves with Fama–French factors and with other series in strategyReturn, supporting interpretation of factor exposures and relationships relevant to the simple spread trading objective.

Furthermore, the three plausible strategies are highly correlated. Therefore, for the remainder of the analysis we consider only the five-day strategy for comparison with the Fama–French factors.

Comparison Over Time — Quarterly Time Periods

import matplotlib.pyplot as plt
import seaborn as sns

subset = strategyReturn[[’Mkt-RF’, ‘SMB’, ‘HML’, ‘5_Day_Strategy’]]

quarters = [
    (’2020-01’, ‘2020-03’),
    (’2020-04’, ‘2020-06’),
    (’2020-07’, ‘2020-09’),
    (’2020-10’, ‘2020-12’),
    (’2021-01’, ‘2021-03’),
    (’2021-04’, ‘2021-06’),
    (’2021-07’, ‘2021-09’),
    (’2021-10’, ‘2021-12’),
]

fig, axes = plt.subplots(4, 2, figsize=(15, 15))

for ax, (start, end) in zip(axes.flatten(), quarters):
    sns.heatmap(subset[start:end].corr(), ax=ax, annot=True, cmap=’Blues’)
    ax.set_title(f”Quarter: {start} to {end}”)
    fig.tight_layout()

This block is building a compact, quarter-by-quarter diagnostic that reveals how the 5-day spread strategy co-moves with common risk factors, so you can judge whether the strategy is driven by market, size, or value exposures (or behaves more idiosyncratically) as the business cycles through the specified quarters. It starts by selecting the four series of interest from the wider returns table: market excess return (“Mkt-RF”), the size factor (“SMB”), the value factor (“HML”), and the strategy’s 5-day returns. Those four columns form the universe for the pairwise correlation analysis that follows.

The code then defines eight quarter intervals as start/end strings. Those strings are intended to be used for pandas time-based slicing of the returns index, so each (start, end) tuple isolates the rows in that quarter. For each quarter slice the code computes the correlation matrix over just those rows — by calling .corr() on the sliced DataFrame — which yields a 4×4 Pearson correlation matrix describing how each series co-varies with each other during that quarter.

A 4×2 grid of subplots is created to hold the eight quarter heatmaps; flattening the axes array and zipping it with the list of quarters pairs each subplot with its corresponding time window. Inside the loop, seaborn.heatmap is called with annot=True to print the numeric correlation coefficients inside each cell, and cmap=’Blues’ to visually encode magnitude via shades of blue. Each subplot gets a descriptive title showing the quarter range. The figure’s layout is tightened after adding each subplot to reduce overlap between labels and titles, keeping the small multiples readable.

Taken together, the sequence produces one heatmap per quarter that lets you visually and numerically compare exposures over time: strong positive or negative cells indicate systematic exposure of the strategy to that factor during the quarter, while weak correlations suggest more independent spread behavior. Viewing these side-by-side across the eight quarters helps you assess whether the strategy’s correlations with Mkt-RF, SMB, and HML are persistent, time-varying, or episodic, which is directly relevant to understanding the drivers of returns for simple spread trading.

Conclusion

- Classify trading strategies and select appropriate parameters according to constraints such as trading costs and trading frequency.

- For highly correlated, quickly mean-reverting ETF pairs, use a short rolling window of returns (up to about five days) to capture spread profitability.

- To reduce trade frequency, raise entry thresholds.

- When confidence in mean reversion is high — as in this case — set stop-loss limits more leniently to allow trades to reach profit.

- During periods of lower volatility, spread-reversion strategies are much less correlated with the market and common Fama–French factors, making them attractive for diversification.

- In contrast, during episodes of extreme market volatility (for example, March 2020), these strategies have exhibited strong negative correlation with the market risk premium.

Use the button below to download source code

User's avatar

Continue reading this post for free, courtesy of Onepagecode.

Or purchase a paid subscription.
© 2026 Onepagecode · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture