Algorithmic Trading Frameworks: Implementing Quantile-Based Signal Logic EP8/365
Building a high-conviction long-short framework that scales across 200+ U.S. equities.
Use the button at the end to download the source code.
This evaluation has two primary objectives. The first objective is to build a quantamental framework that uses quantitative techniques to evaluate the core financial strength of firms. The process begins with a broad set of companies drawn from multiple sectors. Firms from the automotive finance and insurance sectors are excluded because they rely on different evaluation standards and have shown distinct pricing behavior in recent periods. Additional screening is applied to remove companies with missing data or financial indicators that do not accurately represent their operational condition. Once this filtering is complete the framework provides a consistent and reliable base for further analysis.
After establishing this foundation the next step is to design an investment approach based on quantiles. This approach uses insights derived from the quantamental framework with the aim of generating returns through systematic selection. The strategy relies on ranking firms using a chosen financial indicator such as the price to earnings ratio. Companies are ordered from lowest to highest values where lower values suggest relative undervaluation and higher values suggest relative overvaluation. Investment positions are taken only at the extremes of the ranking such as the lowest group for buying and the highest group for selling short. The focus is on relative comparison across firms rather than isolated absolute values which makes normalization techniques like standardization useful for improving comparability.
The study illustrates how at least two hundred equities listed in the United States can be selected from a large corporate database using targeted screening rules. It then evaluates several quantile based investment strategies built on different fundamental characteristics. The analysis period runs from January two thousand fourteen to January two thousand twenty one covering a total span of seven years.
%matplotlib inline
import warnings
warnings.filterwarnings(’ignore’)
from collections import defaultdict
import matplotlib.pyplot as plt
plt.rcParams[”figure.figsize”] = (15, 6)
import pandas as pd
import pandas_datareader as pdr
import numpy as np
import nasdaqdatalink
import seaborn as sns
sns.color_palette(”mako”, as_cmap=True)
import plotnine as p9It begins by enabling inline matplotlib rendering with %matplotlib inline, ensuring that plots generated during analysis — such as equity curves or quantile distributions — display directly within notebook cells for immediate inspection without disrupting the exploratory process. To maintain a clean execution flow, warnings are suppressed via warnings.filterwarnings(‘ignore’), which prevents extraneous output from libraries like pandas or numpy from cluttering the notebook, allowing focus on the core quantitative insights rather than diagnostic noise.
Next, essential data structures and visualization tools are imported to handle the heterogeneous financial datasets typical in quantamental modeling. The defaultdict from collections provides a convenient way to aggregate metrics, such as grouping asset returns by quantiles or tracking strategy performance across regimes, without explicit key checks that could complicate iterative computations. Matplotlib is imported as plt with figure dimensions set to (15, 6) via rcParams, optimizing the display of time-series plots like cumulative returns or volatility surfaces, which are crucial for validating the quantile-based entry and exit signals in our trading strategy.
Pandas, aliased as pd, serves as the backbone for data manipulation, enabling efficient handling of tabular financial data — such as loading historical prices, computing rolling quantiles for signal generation, or aligning multi-asset datasets for factor modeling in the quantamental framework. Complementing this, pandas_datareader (as pdr) facilitates seamless retrieval of market data from sources like Yahoo Finance, populating DataFrames with adjusted closes or volumes needed to backtest quantile thresholds against real-world price action. Numpy, imported as np, underpins numerical operations, such as vectorized calculations for percentile rankings or covariance matrices, ensuring high-performance array manipulations that scale to large portfolios in our strategy development.
For specialized data access, nasdaqdatalink is brought in to pull macroeconomic or alternative datasets — think Fama-French factors or economic indicators — that enrich the quantamental model by incorporating fundamental signals alongside price-based quantiles, allowing us to refine trading rules based on broader market contexts. Visualization is further enhanced with seaborn (sns), where color_palette(“mako”, as_cmap=True) establishes a professional, perceptually uniform color scheme for heatmaps of correlation matrices or quantile bin visualizations, improving interpretability of strategy sensitivities. Finally, plotnine (as p9) introduces a grammar-of-graphics approach for declarative plotting, ideal for creating layered visualizations of quantile strategy performance metrics, such as faceted plots comparing drawdowns across asset classes, thereby streamlining the communication of model results in our iterative development cycle. This setup collectively prepares the notebook to ingest, process, and visualize data flows from raw market inputs through to actionable trading insights.
def buildStrategy(initialCapital, rankCriteria, weightStrategy, quantile, holdingWindow = None):
‘’‘
Builds a monthly quantile trading strategy based on:
1. rankCriteria (generally a z-score measure),
2. weightCriteria (equalWeighted, mCapWeighted, stratified)
3. Quantile traded (around 10%)
4. Holding Window (threshold to hold the stock in subsequent month)
‘’‘
current_balance = initialCapital
strategy_record = tracker.copy()
sample_period_stocks = len(fullDataSet.loc[strategy_record.iloc[0][’EnterDate’]])
position_count = int(round(sample_period_stocks * quantile, 0))
if holdingWindow is None:
retention_cap = position_count
else:
retention_cap = int(round(sample_period_stocks * holdingWindow, 0))
upward_securities, downward_securities = [], []
upward_retentions, downward_retentions = [], []
aggregate_positions = 0
aggregate_retentions = 0
index_list = list(strategy_record.index)
counter = 0
while counter < len(index_list):
active_index = index_list[counter]
strategy_record.loc[active_index, ‘StartCapital’] = current_balance
ingress_period = fullDataSet.loc[strategy_record.loc[active_index, ‘EnterDate’]]
egress_period = fullDataSet.loc[strategy_record.loc[active_index, ‘ExitDate’]]
strategy_record.loc[active_index, ‘ShortPos’] = -current_balance / 20
strategy_record.loc[active_index, ‘LongPos’] = current_balance / 20
ingress_time = strategy_record.loc[active_index, ‘EnterDate’]
egress_time = strategy_record.loc[active_index, ‘ExitDate’]
financing_rate = strategy_record.loc[active_index, ‘RepoRate’]
interval_days = (egress_time - ingress_time).days
financing_yield = - (interval_days / 360) * financing_rate * strategy_record.loc[active_index, ‘ShortPos’] / 100
strategy_record.loc[active_index, ‘RepoCash’] = financing_yield
ranking_order = ingress_period.sort_values(by=rankCriteria)[’ticker’].tolist()
downward_retention_pool = ranking_order[:retention_cap]
upward_retention_pool = [] if retention_cap == 0 else ranking_order[-retention_cap:]
downward_retentions = list(set(downward_securities) & set(downward_retention_pool))
upward_retentions = list(set(upward_securities) & set(upward_retention_pool))
downward_securities = ranking_order[:position_count - len(downward_retentions)] + downward_retentions
if position_count == len(upward_retentions):
upward_securities = upward_retentions
else:
upward_securities = ranking_order[-position_count + len(upward_retentions):] + upward_retentions
aggregate_positions += 2 * (len(downward_securities) + len(upward_securities))
aggregate_retentions += 2 * (len(downward_retentions) + len(upward_retentions))
downward_portion = computeWeights(downward_securities, weightStrategy, ingress_period)
upward_portion = computeWeights(upward_securities, weightStrategy, ingress_period)
bear_tix = pd.Series(downward_securities)
bear_amts = pd.Series(downward_portion) * strategy_record.loc[active_index, ‘ShortPos’]
bear_aux = pd.DataFrame({’ticker’: bear_tix, ‘portion’: bear_amts})
bear_aux = bear_aux.merge(ingress_period[[’ticker’, ‘zacks_sector_code’]], on=’ticker’)
bear_grouped = bear_aux.groupby(’zacks_sector_code’)[’portion’].sum()
bull_tix = pd.Series(upward_securities)
bull_amts = pd.Series(upward_portion) * strategy_record.loc[active_index, ‘LongPos’]
bull_aux = pd.DataFrame({’ticker’: bull_tix, ‘portion’: bull_amts})
bull_aux = bull_aux.merge(ingress_period[[’ticker’, ‘zacks_sector_code’]], on=’ticker’)
bull_grouped = bull_aux.groupby(’zacks_sector_code’)[’portion’].sum()
combined_grouped = bear_grouped.add(bull_grouped, fill_value=0)
for cat, val in combined_grouped.items():
strategy_record.loc[active_index, zacksSectors[cat]] = val
downward_return = sum(
(list(egress_period.set_index(’ticker’).loc[downward_securities, ‘adj_close’])[j] - list(ingress_period.set_index(’ticker’).loc[downward_securities, ‘adj_close’])[j]) *
(strategy_record.loc[active_index, ‘ShortPos’] * downward_portion[j] / list(ingress_period.set_index(’ticker’).loc[downward_securities, ‘adj_close’])[j])
for j in range(len(downward_securities))
) if downward_securities else 0
upward_return = sum(
(list(egress_period.set_index(’ticker’).loc[upward_securities, ‘adj_close’])[j] - list(ingress_period.set_index(’ticker’).loc[upward_securities, ‘adj_close’])[j]) *
(strategy_record.loc[active_index, ‘LongPos’] * upward_portion[j] / list(ingress_period.set_index(’ticker’).loc[upward_securities, ‘adj_close’])[j])
for j in range(len(upward_securities))
) if upward_securities else 0
strategy_record.loc[active_index, ‘ShortPnL’] = downward_return
strategy_record.loc[active_index, ‘LongPnL’] = upward_return
strategy_record.loc[active_index, ‘MonthPnL’] = downward_return + upward_return + financing_yield
strategy_record.loc[active_index, ‘EndCapital’] = current_balance + strategy_record.loc[active_index, ‘MonthPnL’]
current_balance = strategy_record.loc[active_index, ‘EndCapital’]
counter += 1
strategy_record[’CumPnL’] = strategy_record[’MonthPnL’].cumsum()
return strategy_record, aggregate_positions - aggregate_retentionsThis approach enables a market-neutral, long-short portfolio that exploits quantile dispersions, with the overall goal of evaluating the strategy’s performance in a quantile trading framework. The function begins by initializing the current balance to the provided initial capital and copying a pre-defined tracker DataFrame to record strategy outcomes. It then determines the number of positions to take based on the quantile parameter (e.g., 10% of available stocks in the entry period), ensuring the portfolio size scales with the universe. If a holding window is specified, it sets a retention cap to carry over a subset of prior positions into the next month, promoting continuity in holdings to reduce turnover and transaction costs; otherwise, no retentions occur.
As the simulation progresses through each monthly period in the tracker’s index, the function updates the starting capital for that iteration and retrieves the ingress (entry) and egress (exit) period data from the full dataset, which contains adjusted close prices and other attributes for all securities. It allocates positions by setting the short position to negative one-twentieth of the current balance and the long to positive one-twentieth, creating a balanced exposure where each side represents 5% of capital, thus maintaining overall neutrality while leveraging the full capital efficiently across longs and shorts. To account for borrowing costs in the short leg, it calculates the financing yield using the repo rate for the period, adjusted for the number of days between entry and exit, and prorated over 360 days — this deducts the cost from short proceeds, reflecting real-world margin financing in quantile strategies.
The core selection logic ranks all securities in the ingress period by the rank criteria in ascending order, identifying the bottom quantile for shorts (downward securities) and the top for longs (upward securities). To incorporate retentions when a holding window is active, it first defines retention pools from the extremes of the ranking and intersects them with the previous period’s holdings, ensuring persistent positions only if they remain in the eligible extremes; this mechanism stabilizes the portfolio by retaining high-conviction signals across months. The new security lists are then built by adding fresh selections to fill the position count, prioritizing retentions to minimize churn. Aggregate counters track total positions and retentions across periods, providing insight into turnover dynamics central to strategy evaluation.
Weights are computed for the downward and upward securities using the specified strategy — such as equal, market-cap, or stratified weighting — to allocate the position sizes proportionally, which helps mitigate concentration risk and aligns with quantamental principles of diversified exposure within quantiles. Sector exposures are then derived by grouping the weighted positions by Zacks sector codes, merging with ingress data, and summing portions for both legs; this net exposure per sector is recorded in the tracker, allowing us to monitor and balance industry tilts that could arise from quantile selections. Returns are calculated by simulating the price changes from ingress to egress adjusted closes, weighted by each security’s allocation relative to its entry price, and scaled by the long or short position size — this yields the profit/loss (PnL) for each leg, capturing the strategy’s directional bets on quantile outperformers and underperformers.
Finally, the monthly PnL combines short PnL, long PnL, and financing yield, updating the ending capital by adding it to the starting balance, which compounds forward to the next period’s starting point. After processing all periods, cumulative PnL is computed as a running sum of monthly PnL, providing a complete performance trajectory. The function returns the enriched strategy record alongside the net number of new positions (total positions minus retentions), quantifying the strategy’s activity level over time in support of our quantile trading model assessments.
def computeWeights(securities, weightStrategy, enterDay):
‘’‘
This function calculates capital allocation weights for specified securities on the entry date using the provided strategy.
‘’‘
if weightStrategy == ‘equalWeighted’:
count = len(securities)
portions = [1.0 / count] * count
return portions
elif weightStrategy == ‘mCapWeighted’:
values = []
for ticker in securities:
slice_data = enterDay[enterDay[’ticker’] == ticker][’mcap’]
values.append(slice_data[0])
aggregate = sum(values)
ratios = []
for value in values:
ratios.append(value / aggregate)
return ratios
elif weightStrategy == ‘stratified’:
count = len(securities)
orders = tuple(range(1, count + 1))
prelim_values = []
for order in orders:
temp = 0.1 * (1.0 / order - 0.02)
prelim_values.append(max(0.0, temp))
aggregate = 0.0
for pv in prelim_values:
aggregate += pv
final_ratios = [pv / aggregate for pv in prelim_values]
return final_ratiosThis function takes three inputs: a list of securities (typically ticker symbols selected via quantile criteria), the weighting strategy as a string identifier, and the enterDay DataFrame containing market data for that entry date, including metrics like market capitalization. The logic begins by evaluating the weightStrategy parameter to branch into one of three distinct paths, each designed to compute a list of normalized weights that sum to 1.0, representing the proportional capital allocation for each security.
For the ‘equalWeighted’ strategy, the function promotes simplicity and diversification by assigning identical portions to all securities, regardless of their individual characteristics. It first determines the total number of securities, then generates a list where each weight is simply the reciprocal of that count — effectively dividing the portfolio equally. This approach is particularly useful in quantile trading when we want to avoid concentration risk and ensure balanced exposure across the selected universe, allowing the strategy to capture broad market movements without bias toward larger entities.
Shifting to the ‘mCapWeighted’ strategy, the function shifts focus to market capitalization as a fundamental driver of influence, reflecting the real-world dynamics where larger companies often have greater impact on portfolio performance. Here, it iterates over each security in the list, slicing the enterDay DataFrame to extract the market cap (‘mcap’) value for that ticker on the entry date, collecting these into a values list. It then computes the aggregate sum of these market caps and derives ratios by dividing each individual market cap by this total, yielding weights proportional to size. This method integrates quantamental principles by leveraging fundamental data to weight allocations, which helps in quantile strategies to emphasize securities with stronger economic footprints while maintaining normalization for total capital deployment.
Finally, for the ‘stratified’ strategy, the function implements a more nuanced, order-based allocation to introduce a layered risk profile, ideal for quantile trading where securities might be ranked by performance metrics. It starts by establishing the count of securities and creating a tuple of sequential orders from 1 to that count, treating these as ranks (e.g., top quantile first). For each order, it calculates a preliminary value using the formula 0.1 times (1 divided by the order minus 0.02), ensuring a non-negative result via the max function to taper weights progressively for lower-ranked securities. These preliminary values are summed to form an aggregate, and the final ratios are obtained by normalizing each preliminary value against this aggregate. This design fosters a stratified exposure that diminishes allocation as rank decreases, promoting concentration in higher quantiles while still including broader participation, which aligns with our goal of balancing potential alpha generation with controlled diversification in the trading model.
def summaryStats(frame):
“”“
Computes summary statistics for the provided DataFrame.
“”“
indices = frame.columns
annualized_means = frame.mean() * 12
annualized_stds = frame.std() * np.sqrt(12)
sharpe_values = annualized_means / annualized_stds
skew_values = frame.skew()
kurt_values = frame.kurt()
min_values = frame.min()
stats_data = {
‘Mean Annualized Return’: annualized_means,
‘Annualized Volatility’: annualized_stds,
‘Sharpe Ratio’: sharpe_values,
‘Skewness’: skew_values,
‘Excess Kurtosis’: kurt_values,
‘Minimum return’: min_values
}
result_df = pd.DataFrame(stats_data, index=indices)
return result_dfThis allows teams to assess the risk-return profile of model components, informing decisions on how to allocate trades into quantiles based on factor exposures or predictive signals. The function begins by extracting the column names from the input DataFrame frame as indices, which typically represent the assets, factors, or portfolio slices being analyzed, ensuring these identifiers carry over to the output for easy traceability.
Next, it calculates the annualized mean returns by multiplying the DataFrame’s row-wise means by 12, assuming the input data consists of monthly observations; this annualization standardizes the metrics for comparability across different time frequencies and time horizons, a common practice in quantitative finance to evaluate long-term strategy viability. Similarly, the annualized standard deviations are computed by scaling the standard deviations with the square root of 12, reflecting the statistical property that volatility scales with the square root of time, providing a measure of annual risk that aligns with the mean for consistent performance analysis in quantile-based trading setups.
The function then derives the Sharpe ratios by dividing the annualized means by the annualized standard deviations, yielding a risk-adjusted return metric that highlights which factors or assets offer the best compensation per unit of volatility — essential for prioritizing quantiles in trading strategies where higher Sharpe values might signal more robust signals for position sizing. To capture non-normal return distributions, which are prevalent in financial data and can impact quantile model reliability, it computes skewness using the DataFrame’s skew method, indicating asymmetry in returns (e.g., positive skew for upside potential), and excess kurtosis via the kurt method, measuring tail heaviness beyond a normal distribution to flag outlier risks in strategy backtests.
Finally, the minimum returns are extracted directly from the DataFrame to quantify downside extremes, helping evaluate drawdown potential in quantile portfolios. These statistics are organized into a dictionary where each key labels a metric and the value is the corresponding pandas Series, preserving the original indices. This dictionary is then converted into a new DataFrame with the original column names as the index, resulting in a transposed, summary table that’s intuitive for review — rows for each asset or factor, columns for metrics — facilitating quick insights into how different quantiles or model elements contribute to overall trading strategy performance.
def analyzeStrategy(initialCapital, rankCriteria, quantile, holdingWindow=None):
figure, axes = plt.subplots(1, 3, figsize=(18, 8))
results_list = []
weighting_methods = [’equalWeighted’, ‘mCapWeighted’, ‘stratified’]
for idx, method in enumerate(weighting_methods):
strategy_no_hold, num_trades_no_hold = buildStrategy(initialCapital, rankCriteria, method, quantile)
strategy_with_hold, num_trades_with_hold = buildStrategy(initialCapital, rankCriteria, method, quantile, holdingWindow)
results_list.append(strategy_no_hold)
short_gains = strategy_no_hold.ShortPnL.sum()
long_gains = strategy_no_hold.LongPnL.sum()
repo_proceeds = strategy_no_hold.RepoCash.sum()
overall_gains = short_gains + long_gains + repo_proceeds
return_no_hold = round(strategy_no_hold.CumPnL.iloc[-1] / strategy_no_hold.StartCapital.iloc[0] * 100, 2)
return_with_hold = round(strategy_with_hold.CumPnL.iloc[-1] / strategy_with_hold.StartCapital.iloc[0] * 100, 2)
axes[idx].plot(strategy_no_hold[’ExitDate’], strategy_no_hold[’CumPnL’], label=’without hold’)
axes[idx].plot(strategy_with_hold[’ExitDate’], strategy_with_hold[’CumPnL’], label=’with hold’)
axes[idx].legend()
axes[idx].set(title=f”{method} Strategy\nWithout Hold: Trades = {num_trades_no_hold}, ROC = {return_no_hold}%\nWith Hold: Trades = {num_trades_with_hold}, ROC = {return_with_hold}%\nProfit Share: Short: {round(100 * short_gains / overall_gains)}%, Long: {round(100 * long_gains / overall_gains)}%, Repo: {round(100 * repo_proceeds / overall_gains)}%”)
plt.suptitle(f’Cummulative Profit and Loss on {rankCriteria} Criteria’, fontsize=24)
plt.tight_layout()
plt.show()
first_strategy = results_list[0]
box_plot = sns.boxplot(data=first_strategy.iloc[:, 12:])
box_plot.set_xticklabels(box_plot.get_xticklabels(), rotation=30)
box_plot.set_title(”Sectoral Distribution - Equally weighted trading strategy”, fontsize=16)
box_plot.set_xlabel(’Sector’, fontsize=14)
box_plot.set_ylabel(’Net Invested position’, fontsize=14)
plt.show()
monthly_pnls = [s.MonthPnL for s in results_list]
combined_df = pd.concat(monthly_pnls, ignore_index=True)
stats_output = summaryStats(combined_df.T)
stats_output.index = weighting_methods
stats_output.index.name = ‘Strategy’
display(stats_output)
return results_listIt takes initial capital, rank criteria, a quantile threshold for asset selection, and an optional holding window to assess how these parameters influence profitability across different weighting schemes. The function begins by initializing a subplot figure with three panels to facilitate side-by-side comparisons, ensuring we can visually track cumulative profit and loss (PnL) trajectories for each strategy variant. This setup allows us to quantify the impact of holding periods on trade frequency and returns, which is crucial for refining quantile-based selection in volatile markets.
The core logic iterates over three weighting methods — equal-weighted, market-cap-weighted, and stratified — to simulate diverse portfolio constructions that reflect real-world trading dynamics. For each method, the function first invokes buildStrategy twice: once without a holding window to capture immediate rebalancing scenarios, and once with the specified holding window to evaluate longer-term position retention. This dual simulation reveals how holding influences trade counts and overall returns, as shorter horizons might increase turnover but expose the strategy to higher transaction costs, while longer holds could enhance compounding in trending markets. The no-hold strategy is stored in a results list for later aggregation, enabling downstream analysis of multiple scenarios. Key performance metrics are then derived from the no-hold strategy: short gains from short positions, long gains from long positions, and repo proceeds from cash management, which together form overall gains. These components highlight the strategy’s reliance on directional bets and liquidity usage, tying directly to quantile trading’s goal of exploiting mispricings across asset ranks.
Returns on capital (ROC) are calculated for both strategy variants by dividing the final cumulative PnL by the initial capital and scaling to percentage, providing a standardized measure of efficiency that helps compare weighting methods’ effectiveness in capital allocation. Each subplot then visualizes the cumulative PnL over exit dates for the no-hold and with-hold cases, with the axis title embedding trade counts, ROC values, and a profit share breakdown (short, long, and repo percentages). This plotting approach narrates the strategy’s evolution, showing how weighting influences risk-adjusted performance — equal weighting democratizes exposure, market-cap weighting prioritizes scale, and stratified weighting balances segments — ultimately aiding in selecting robust configurations for our quantamental framework.
Following the loop, the figure is finalized with a suptitle referencing the rank criteria, layout adjustments for clarity, and display, offering an at-a-glance assessment of how quantile thresholds drive cumulative outcomes across weightings. To delve into sectoral nuances, particularly for the equal-weighted baseline, the function extracts the first strategy from the results list and generates a boxplot of net invested positions starting from column 12 onward, assuming these represent sectoral breakdowns. This visualization underscores diversification within the quantile-selected universe, illustrating how equal weighting distributes exposure across sectors to mitigate concentration risks inherent in ranked asset selection.
For a quantitative summary, the function collects monthly PnL series from all no-hold strategies in the results list, concatenating them into a single DataFrame to enable cross-method comparisons. It then applies summaryStats to the transposed DataFrame, renaming the index to the weighting methods for contextual labeling, and displays the output. This step computes aggregated statistics like means, variances, or Sharpe ratios on monthly returns, revealing the stability and scalability of each weighting in the quantile strategy. Finally, the function returns the list of no-hold strategy DataFrames, allowing further integration into our broader model development pipeline for iterative refinement of trading rules.
def holdIllustration():
‘’‘
Illustration of hold strategy in Quantile Trading
‘’‘
plot_width, plot_height = 8, 6
plt.figure(figsize=(plot_width, plot_height))
plt.title(’Hold Strategy Illustration’, fontsize=20)
plt.ylabel(’Quantile’, fontsize=16)
plt.xticks([], [])
line_configs = [
(0.9, ‘g’, ‘-’, ‘Buy Threshold’),
(0.75, ‘b’, ‘dashdot’, ‘Hold Buy Threshold’),
(0.1, ‘r’, ‘-’, ‘Sell Threshold’),
(0.25, ‘m’, ‘dashdot’, ‘Hold Sell Threshold’)
]
for threshold_val, line_color, style_type, label_text in line_configs:
plt.axhline(y=threshold_val, color=line_color, linestyle=style_type, label=label_text)
zone_configs = [
(’Sell Zone’, 0.02),
(’Hold-Sell Zone’, 0.18),
(’Hold-Buy Zone’, 0.82),
(’Buy Zone’, 0.92)
]
text_font_size = 16
x_position = 0.37
for zone_name, y_position in zone_configs:
plt.text(x_position, y_position, zone_name, fontsize=text_font_size)
plt.legend()
plt.show()This illustration plots horizontal threshold lines and labeled zones on a y-axis representing quantile values from 0 to 1, abstracting the price or indicator momentum into probabilistic buckets to inform strategy logic without needing actual market data.
The function begins by initializing a matplotlib figure with specified dimensions for clarity and readability, then configures the plot’s title to “Hold Strategy Illustration” and the y-label to “Quantile,” while suppressing x-axis ticks to keep the focus on the vertical quantile spectrum. This setup creates a clean, vertical canvas that mirrors the one-dimensional nature of quantile thresholds in our trading model, where decisions hinge on where a current quantile falls relative to predefined levels, emphasizing the strategy’s reliance on statistical positioning rather than time-series progression.
Next, it defines a list of line configurations, each specifying a quantile threshold value, color, line style, and label for key decision points: a solid green line at 0.9 for the buy threshold, a dashdot blue line at 0.75 for the hold-buy threshold, a solid red line at 0.1 for the sell threshold, and a dashdot magenta line at 0.25 for the hold-sell threshold. These are plotted as horizontal lines across the figure using axhline, which efficiently draws reference levels that divide the quantile space into actionable regions. By using distinct colors and styles, the plot visually distinguishes aggressive actions (buy/sell) from conservative ones (hold variants), illustrating why our strategy incorporates buffer zones around core thresholds — to mitigate whipsaws in volatile quantile movements and promote position stability in the quantamental framework.
Following the lines, the function outlines zone configurations as pairs of descriptive names and corresponding y-positions near the thresholds, such as “Sell Zone” at 0.02 (below the sell threshold), “Hold-Sell Zone” at 0.18 (between sell and hold-sell), “Hold-Buy Zone” at 0.82 (between hold-buy and buy), and “Buy Zone” at 0.92 (above the buy threshold). These are annotated as text elements at a fixed x-position with a consistent font size, positioning the labels to hover within or adjacent to their respective zones. This textual overlay clarifies the strategy’s flow: when a quantile enters the buy zone, the model signals entry; in hold zones, it maintains positions to capture sustained trends; and in sell zones, it prompts exits, all tied to our goal of quantile-driven risk management that balances opportunity with caution.
Finally, the legend is added to map colors, styles, and labels for quick reference, and the plot is displayed. This culminates the illustration by providing a static, interpretable snapshot of how quantile thresholds orchestrate the hold strategy, reinforcing the quantamental approach where probabilistic signals from models like ours dictate trading discipline across market regimes.
Crafting Quantitative-Fundamental Frameworks
setattr(nasdaqdatalink.ApiConfig, ‘api_key’, ‘YourAPIkey’)This single line of code plays a foundational role by configuring the authentication for the Nasdaq Data Link API, which provides the datasets needed for such quantitative analysis. Specifically, it uses the setattr function to dynamically assign the API key to the api_key attribute of the nasdaqdatalink.ApiConfig class. This approach works by treating the class as an object and setting its attribute at runtime, effectively updating the library’s configuration without needing to instantiate an object or modify source files. The “story” here is straightforward: before any data queries can be made — such as fetching stock prices, economic indicators, or sector data for quantile-based portfolio optimization — this setup ensures that subsequent API calls from the nasdaqdatalink library include the necessary credentials, enabling seamless integration of real-world market data into our model’s feature engineering and backtesting pipeline. By handling this configuration early, it allows the rest of the codebase to focus on data processing and strategy logic, maintaining a clean separation between setup and core computations.
“”“
ZACKS/FC Fundamentals Summary:
Full collection includes 200 key metrics across more than 19,500 firms
“”“
data_source = None
import os
if os.path.exists(’fc.pkl’):
data_source = pd.read_pickle(’fc.pkl’)
else:
message = ‘Downloading from NasdaqDataLink’
print(message)
selected_fields = [
‘ticker’, ‘per_end_date’, ‘per_type’, ‘zacks_sector_code’, ‘exchange’,
‘basic_net_eps’, ‘diluted_net_eps’, ‘tot_lterm_debt’, ‘net_lterm_debt’, ‘filing_date’
]
date_filter = dict(gte=’2013-06-30’, lte=’2021-01-31’)
retrieval_params = dict(
columns=selected_fields
)
data_source = nasdaqdatalink.get_table(
‘ZACKS/FC’,
qopts=retrieval_params,
per_end_date=date_filter,
paginate=True
)
data_source.to_pickle(’fc.pkl’)The process begins by initializing a data_source variable to None, ensuring we have a clean starting point before attempting to load or fetch the dataset. It then checks for the existence of a local pickle file named ‘fc.pkl’, which serves as a cached version of the data to avoid redundant downloads and speed up iterative model development workflows.
If the pickle file is present, the code efficiently loads the entire dataset into the data_source using pandas’ read_pickle method, allowing seamless access to the pre-processed fundamentals for immediate use in downstream model building, such as calculating quantile-based indicators from earnings per share or debt metrics. This approach prioritizes efficiency, as reloading from cache is instantaneous compared to querying external sources repeatedly during experimentation.
Should the file not exist — typically on the first run or after cache invalidation — the code prints a status message to inform the user of the impending download from Nasdaq Data Link, maintaining transparency in the data pipeline. It then defines a curated list of selected_fields, focusing on key identifiers and metrics like ticker symbols, period end dates, sector codes, exchange information, basic and diluted net earnings per share, total and net long-term debt, and filing dates. These fields are chosen deliberately to capture core fundamental drivers relevant to our quantamental framework, enabling analysis of profitability, leverage, and temporal patterns without overwhelming the dataset with the full 200 available metrics.
A date_filter is applied next, restricting the data to periods between June 30, 2013, and January 31, 2021, to align with our strategy’s historical backtesting window and ensure computational feasibility while covering a robust sample of market cycles for quantile estimation. The retrieval_params dictionary bundles the selected columns for targeted querying, optimizing bandwidth and processing by fetching only what’s needed. The nasdaqdatalink.get_table function is invoked on the ‘ZACKS/FC’ dataset, passing these parameters along with the date filter and enabling pagination to handle the large volume of records — over 19,500 firms — without memory overload, progressively retrieving and assembling the full table into data_source.
Finally, the fetched data is persisted to ‘fc.pkl’ via to_pickle, creating the cache for future runs and ensuring reproducibility across team members or sessions. This flow establishes a reliable foundation of fundamental data, directly supporting the integration of qualitative insights into our quantitative models for deriving trading signals based on quantile thresholds of financial health indicators.
“”“
Additional Zacks fundamentals data featuring refreshed common and diluted shares outstanding figures.
“”“
data_file = ‘shrs.pkl’
try:
share_info = pd.read_pickle(data_file)
except Exception:
print(’Downloading from NasdaqDataLink’)
filter_dates = dict(gte=’2013-06-30’, lte=’2021-01-31’)
select_fields = dict(columns=[’ticker’, ‘per_end_date’, ‘per_type’, ‘shares_out’])
share_info = nasdaqdatalink.get_table(
‘ZACKS/SHRS’,
per_end_date=filter_dates,
qopts=select_fields,
paginate=True
)
share_info.to_pickle(data_file)The process begins by attempting to load pre-existing data from a local pickle file named ‘shrs.pkl’ using pandas’ read_pickle function, ensuring efficient reuse of previously fetched information to avoid redundant API calls during iterative model development or backtesting.
If the file does not exist or cannot be read — due to it being the first run or a data corruption issue — the code pivots to downloading fresh data from Nasdaq Data Link. This is triggered by a broad exception handler, which prints a simple status message to indicate the download initiation, maintaining transparency in the data pipeline without interrupting the workflow. The download targets the ‘ZACKS/SHRS’ table, applying date filters to constrain the period from June 30, 2013, to January 31, 2021; this range aligns with our strategy’s historical backtesting window, capturing relevant quarterly or annual reporting periods while keeping the dataset manageable for computational efficiency.
To retrieve only the necessary fields, the code specifies a selection dictionary including ‘ticker’ for stock identification, ‘per_end_date’ and ‘per_type’ for temporal and periodicity context, and ‘shares_out’ for the core shares outstanding values — both common and diluted variants as noted in the docstring. This targeted query minimizes data volume and processing overhead, directly supporting our need for refreshed figures in valuation models. The nasdaqdatalink.get_table call incorporates pagination to handle large result sets reliably, preventing memory issues or API throttling during bulk fetches.
Once obtained, the data is immediately serialized back to the ‘shrs.pkl’ file via to_pickle, establishing a persistent cache for subsequent runs. This idempotent design ensures that the shares outstanding data flows seamlessly into downstream components of our quantamental framework, such as feature engineering for quantile regressions or risk-adjusted position sizing in the trading strategy, by providing a reliable, up-to-date source without repeated external dependencies.
‘’‘
Zacks Fundamental Metrics (FR Dataset)
includes 26 key financial ratios
‘’‘
import os
data = None
if os.path.exists(’fr.pkl’):
data = pd.read_pickle(’fr.pkl’)
else:
print(’Downloading from NasdaqDataLink’)
start_dt = ‘2013-06-30’
end_dt = ‘2021-01-31’
col_list = [’ticker’, ‘per_type’, ‘per_end_date’, ‘tot_debt_tot_equity’, ‘ret_invst’]
date_filter = {’gte’: start_dt, ‘lte’: end_dt}
query_options = {’columns’: col_list}
raw_data = nasdaqdatalink.get_table(’ZACKS/FR’, per_end_date=date_filter, qopts=query_options, paginate=True)
raw_data.to_pickle(’fr.pkl’)
data = raw_dataThese metrics, such as debt-to-equity and return on investment, allow us to incorporate fundamental analysis into our trading signals, enabling quantile-based portfolio construction that balances risk and return across market segments.
The process begins by initializing a variable to hold the dataset, ensuring we can reuse previously fetched data efficiently. It first checks for the existence of a local pickle file named ‘fr.pkl’, which stores the processed data in a compact, binary format for quick subsequent loads. This approach optimizes performance in our iterative model development workflow, avoiding redundant downloads during repeated executions and reducing dependency on external APIs.
If the pickle file is absent, the code initiates a fresh download from Nasdaq Data Link, signaling the start of data retrieval for the Zacks FR table. It defines a specific date range from June 30, 2013, to January 31, 2021, to focus on a historical period relevant to our backtesting and strategy validation, capturing market cycles without excessive data volume that could slow processing. A curated list of columns is selected — ‘ticker’ for stock identification, ‘per_type’ and ‘per_end_date’ for periodicity and timing, and targeted ratios like ‘tot_debt_tot_equity’ and ‘ret_invst’ — to retrieve only the metrics critical for our quantamental features, minimizing bandwidth and storage needs while aligning with model inputs for leverage and profitability analysis.
The query is constructed with a date filter to bound the ‘per_end_date’ within the specified range and options to limit columns, ensuring precise data slicing. The nasdaqdatalink.get_table function is invoked with pagination enabled, which handles large datasets by fetching results in manageable chunks, preventing memory overload during retrieval. Once obtained, the raw data is persisted to the pickle file for future use, and the variable is updated to reference this dataset, seamlessly integrating it into downstream pipelines for feature engineering and quantile strategy simulations. This flow ensures reliable, targeted access to fundamental data, foundational for developing robust trading models that blend quantitative signals with qualitative insights.
‘’‘
Master Table (ZACKS/MT)
descriptive information about all tickers that are included in Zacks products
‘’‘
info_table = None
attempt_load = pd.read_pickle(’mt.pkl’)
info_table = attempt_load
del attempt_load
error_occurred = False
if error_occurred:
pass
else:
download_msg = ‘Downloading from NasdaqDataLink’
print(download_msg)
col_names = (’ticker’, ‘ticker_type’, ‘asset_type’)
param_dict = dict(columns=list(col_names))
fetched_data = nasdaqdatalink.get_table(’ZACKS/MT’, paginate=True, qopts=param_dict)
fetched_data.to_pickle(’mt.pkl’)
info_table = fetched_data
del fetched_data
del param_dict
del col_names
del download_msg
mt = info_table
del info_tableThe code begins by initializing an empty container for the info_table to hold this master table data. It then attempts to load existing data from a local cache file named ‘mt.pkl’ using pandas’ read_pickle function, assigning the result to info_table for quick access if the file is available; this step promotes efficiency by reusing previously fetched data, reducing dependency on external API calls during repeated executions. Immediately after, it cleans up the temporary loading variable to manage memory.
Next, it sets a flag indicating no error occurred during the load (error_occurred = False), and since this condition holds, the code proceeds to the else branch. Here, it prints a status message to inform the user that a fresh download is underway from NasdaqDataLink, ensuring transparency in the data acquisition process. To optimize the fetch, it defines a tuple of only the necessary columns — ticker, ticker_type, and asset_type — then constructs a parameter dictionary specifying these columns as query options, which limits the data volume and focuses on the attributes most relevant to our ticker identification and classification needs in the trading strategy.
The core retrieval happens via nasdaqdatalink.get_table, querying the ‘ZACKS/MT’ dataset with pagination enabled to handle large result sets efficiently and the parameter dictionary to retrieve just the targeted columns. This fetched data is then persisted to the local ‘mt.pkl’ file, overwriting any prior cache to guarantee the latest version is stored for future loads. The info_table is updated to reference this new data, ensuring it reflects the most current ticker descriptions.
Finally, the code assigns the populated info_table to a global variable mt for broader script access — facilitating its use in downstream model building and strategy logic — before deleting the temporary info_table variable to free up memory, maintaining a clean namespace throughout the quantamental workflow.
‘’‘
Market Value Supplement (ZACKS/MKTV)
supplementary information to Zacks fundamentals, with updated values for market capitalization and enterprise value.
‘’‘
PICKLE_FILE = ‘mktv.pkl’
DOWNLOAD_MSG = ‘Downloading from NasdaqDataLink’
DATA_TABLE = ‘ZACKS/MKTV’
DATE_START = ‘2013-06-30’
DATE_END = ‘2021-01-31’
REQUIRED_COLS = [’ticker’, ‘per_type’, ‘per_end_date’, ‘mkt_val’]
supplement_df = None
try:
supplement_df = pd.read_pickle(PICKLE_FILE)
except:
print(DOWNLOAD_MSG)
filter_dates = dict(gte=DATE_START, lte=DATE_END)
select_columns = dict(columns=REQUIRED_COLS)
supplement_df = nasdaqdatalink.get_table(
DATA_TABLE,
per_end_date=filter_dates,
qopts=select_columns,
paginate=True
)
supplement_df.to_pickle(PICKLE_FILE)These values are essential for precise valuation metrics in the quantile trading strategy, ensuring that our models incorporate timely market-based adjustments over the historical period from June 30, 2013, to January 31, 2021. By focusing on this data, we avoid relying solely on potentially outdated fundamentals, allowing the strategy to better capture market dynamics for quantile-based portfolio construction and risk assessment.
The process begins by defining key constants to standardize the operation: the local pickle file for caching, a download message for user feedback, the specific NasdaqDataLink table identifier, the date range for relevance to our model’s backtesting horizon, and the required columns — ticker, period type, period end date, and market value — to keep the dataset lean and targeted. This setup ensures we only retrieve and store data pertinent to equity tickers and their periodic market valuations, aligning with the need for efficient data handling in quantitative workflows.
Next, the code attempts to load the pre-processed data from the pickle file into a DataFrame named supplement_df. This step prioritizes speed and resource efficiency by reusing previously fetched data, which is crucial in iterative model development where repeated data pulls could slow down experimentation with quantile thresholds and trading signals. If the file doesn’t exist or loading fails, it triggers a download sequence: it prints a status message to indicate the operation, constructs date filters for the greater-than-or-equal-to start date and less-than-or-equal-to end date, and specifies the column selection to minimize data transfer volume.
The download uses the nasdaqdatalink.get_table function with pagination enabled to handle large result sets without memory issues, passing the table identifier along with the date and column filters. This approach ensures we fetch only the necessary subset of the ZACKS/MKTV data, filtered by period end dates within our strategy’s timeframe, resulting in a structured DataFrame with rows representing ticker-period combinations and their associated market values. Once obtained, the DataFrame is immediately serialized to the pickle file, establishing a persistent cache for future runs and preventing redundant API calls that could incur costs or rate limits in our quantamental pipeline. Through this flow, the code reliably populates supplement_df with high-quality supplementary data, ready for integration into downstream model features like normalized market caps for quantile sorting in trading decisions.
sector_codes = dict(
zip(
(1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17),
(
‘Consumer Staples’,
‘Consumer Discretionary’,
‘Retail/Wholesale’,
‘Medical’,
‘Basic Materials’,
‘Industrial Products’,
‘Construction’,
‘Multi-Sector Conglomerates’,
‘Business Services (Computer/Retail/Wholesale)’,
‘Aerospace’,
‘Oils/Energy’,
‘Utilities’,
‘Transportation’,
‘Business Services, Construction’,
‘Consumer Discretionary’
)
)
)This allows us to categorize assets like stocks into meaningful groups during feature engineering and risk modeling, enabling more precise quantile-based portfolio construction and trading signals that account for sector-specific behaviors.
The logic begins by defining a dictionary named sector_codes that serves as a lookup table, pairing numeric sector identifiers — commonly used in financial datasets — with their corresponding descriptive names. We start with a tuple of integers representing the sector codes: (1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17). These codes are derived from standard industry classification systems, such as those in stock market databases, where numbers efficiently denote sectors without redundancy.
Next, we have a parallel tuple of strings that provide the human-readable sector names, aligned in the same order: (‘Consumer Staples’, ‘Consumer Discretionary’, ‘Retail/Wholesale’, ‘Medical’, ‘Basic Materials’, ‘Industrial Products’, ‘Construction’, ‘Multi-Sector Conglomerates’, ‘Business Services (Computer/Retail/Wholesale)’, ‘Aerospace’, ‘Oils/Energy’, ‘Utilities’, ‘Transportation’, ‘Business Services, Construction’, ‘Consumer Discretionary’). This ensures a one-to-one correspondence, where each code maps to a specific sector description, handling nuances like combined categories for accuracy in our models.
The zip function then iterates over these two tuples simultaneously, creating pairs of (code, name) elements sequentially — for instance, the first pair is (1, ‘Consumer Staples’), the second (2, ‘Consumer Discretionary’), and so on. This zipped iterable is passed directly to the dict constructor, which transforms it into a dictionary with the integers as keys and the strings as values. The result is an efficient, immutable mapping that we can use throughout the pipeline to translate raw numeric data from sources like stock APIs into sector labels, facilitating downstream processes such as sector-neutral quantile regressions or diversified trading strategies that mitigate exposure to correlated industry risks.
join_columns = [’ticker’, ‘per_end_date’, ‘per_type’]
single_key = [’ticker’]
data_sources = [shrs, fr, mktv]
working_set = fc
for source in data_sources:
working_set = working_set.merge(source, how=’inner’, left_on=join_columns, right_on=join_columns)
final_result = (
working_set
.merge(mt, how=’inner’, left_on=single_key, right_on=single_key)
)
ticker_uniques = len(set(final_result[’ticker’]))
print(’Unique Tickers =’, ticker_uniques)This unification is crucial because our model relies on aligned, comprehensive data across securities and time periods to generate robust features for quantile-based signal generation, ensuring that only high-quality, overlapping records contribute to the strategy’s predictive power.
The process begins by defining the primary join columns — ‘ticker’ for security identification, ‘per_end_date’ for temporal alignment, and ‘per_type’ for consistency in reporting periods. These columns are chosen to precisely match records across datasets that share the same observation context, preventing mismatches that could dilute the model’s accuracy. A secondary key, consisting solely of ‘ticker’, is also established for a later merge where broader alignment suffices. The data sources to integrate include shares outstanding (shrs), financial ratios (fr), and market value (mktv), which provide essential fundamental inputs; the initial working set starts with fundamental characteristics (fc) as the base, serving as the core dataset around which others are layered.
Sequentially, the code iterates through the data sources, performing inner merges on the full set of join columns for each. This stepwise approach builds the working set incrementally: at each step, only records present in both the current working set and the new source are retained, enforcing data completeness and eliminating gaps that might arise from incomplete coverage in any single dataset. By using inner joins, we prioritize intersectional integrity, which is vital for the quantamental framework where missing values could skew quantile regressions or feature engineering. The result is a progressively refined dataset that captures synchronized fundamental metrics across tickers and periods.
Finally, the working set is merged with market data (mt) using an inner join on just the ticker key. This simpler linkage reflects the market dataset’s structure, which likely spans all tickers without the granular period details, allowing us to append market-level insights like prices or volumes to our fundamental base without over-constraining the join. The merged final_result now embodies a unified view of quantamental inputs, ready for downstream modeling. To confirm the effective universe for our strategy, the code computes and prints the number of unique tickers in this result, providing a quick validation of the data’s breadth and ensuring our quantile trading signals will operate across a representative cross-section of securities.
Our dataset begins with a comprehensive collection exceeding 7,200 stock tickers across the full spectrum of available securities. To refine this pool, we systematically exclude symbols associated with the automotive, financial, and insurance industries. This involves removing entries in the Zacks database where the sector code is either 5 or 13. Furthermore, since our focus is exclusively on ordinary shares, we narrow the selection by retaining only those tickers classified under the asset type “COM,” which denotes Common Stock Equities, and discarding all others.
temp_df = combined.query(”zacks_sector_code not in [5, 13] and asset_type == ‘COM’”)
combined = temp_df
print(f”Unique Tickers ={len(temp_df[’ticker’].unique())}”)We begin with the combined DataFrame, which aggregates various financial and fundamental data across assets, ensuring a comprehensive input for modeling equity selection based on quantiles of key metrics like earnings surprises or valuation ratios. To focus our strategy on relevant market segments, we create a temporary DataFrame temp_df by applying a query that excludes rows where the zacks_sector_code is either 5 (typically representing the energy sector) or 13 (often the utilities sector), while retaining only those with asset_type equal to ‘COM’ for common stocks. This filtering is deliberate because our quantile trading approach prioritizes liquid, non-utility equities to mitigate sector-specific risks like regulatory influences or commodity price volatility, allowing the model to emphasize diversified, tradeable opportunities in core market areas.
Once the filtered subset is isolated in temp_df, we overwrite the original combined DataFrame with this refined version, effectively updating our working dataset to proceed with subsequent steps such as feature engineering or quantile binning without carrying extraneous data that could skew statistical computations or introduce noise into the strategy’s signal generation. This reassignment maintains data flow efficiency, keeping the pipeline streamlined for iterative model development where precision in asset selection directly impacts the accuracy of quantile thresholds used for position sizing and entry/exit rules.
Finally, the code outputs the count of unique tickers from the filtered temp_df using a formatted print statement, providing a quick validation metric to confirm the scope of our universe — ensuring we have a sufficiently broad yet focused set of securities for robust quantile strategy performance across historical periods. This logging step supports transparency in the data preparation phase, allowing the team to verify that the filtration aligns with our goal of building a scalable, sector-agnostic quantamental framework.
After narrowing our selection, we now have 3,450 stock symbols remaining for evaluation. To target businesses exhibiting meaningful leverage, the next step involves eliminating symbols where the mean debt-to-equity ratio dips below 0.2.
debt_equity_avgs = combined.groupby(’ticker’)[’tot_debt_tot_equity’].mean()
exclude_symbols = debt_equity_avgs[debt_equity_avgs < 0.2].index
filter_mask = ~combined[’ticker’].isin(exclude_symbols)
combined = combined[filter_mask]
print(’Unique Tickers =’,combined[’ticker’].nunique())The process begins by grouping the combined DataFrame by ticker and calculating the mean value of the ‘tot_debt_tot_equity’ column for each group, which yields a Series of average debt-to-equity ratios per ticker; this step is essential because it aggregates historical or cross-sectional data to capture a ticker’s typical leverage profile, allowing us to assess financial health consistently across the portfolio. Next, it identifies tickers with averages below 0.2 — a threshold chosen to exclude entities with negligible debt relative to equity, which could skew quantile-based signals by representing outliers in low-risk or atypical capital structures — and extracts their indices to form the exclusion list. A boolean mask is then created by negating the membership check of these excluded tickers within the ‘ticker’ column of the combined DataFrame, effectively flagging rows to retain those from qualifying symbols. Applying this mask subsets the combined DataFrame to include only the filtered data, preserving the integrity of subsequent modeling steps like factor construction and quantile binning. Finally, it outputs the count of unique tickers post-filtering, providing a quick validation of the dataset’s scope and confirming that the strategy’s universe remains sufficiently diverse for robust quantile trading simulations.
Data Gaps in Trading Datasets
Incomplete records represent a critical issue within our comprehensive trading database. If historical information isn’t current, essential financial metrics fail to reflect the most recent market pricing details. As a result, starting with 2359 stock symbols still in play, we systematically remove those containing any voids in their data. This process includes inspecting for “Not Available” (NA) values across the dataset. Furthermore, we confirm that the end-of-period timestamps cover 23 quarterly spans alongside 8 yearly ones, yielding a combined total of 31 distinct intervals.
symbols_to_exclude = []
for sym in tickers:
filtered_data = combined[combined[’ticker’] == sym]
total_entries = len(filtered_data)
non_null_entries = len(filtered_data.dropna())
if total_entries != 31 or total_entries != non_null_entries:
symbols_to_exclude.append(sym)
combined = combined[~combined[’ticker’].isin(symbols_to_exclude)]
print(’Unique Tickers =’, combined[’ticker’].nunique())The ‘combined’ DataFrame aggregates historical data for various stock tickers, and we need reliable, complete observations to accurately compute quantiles and derive trading signals that blend quantitative metrics with fundamental insights. To achieve this, the code first identifies and excludes tickers with incomplete or noisy data, maintaining a standardized dataset that supports robust model training and backtesting.
The process begins by initializing an empty list, ‘symbols_to_exclude’, to track tickers that fail our quality criteria. It then iterates over each ticker symbol in the ‘tickers’ list, which represents the full set of symbols we’re evaluating. For each symbol, the code filters the ‘combined’ DataFrame to isolate rows specific to that ticker, creating a temporary subset called ‘filtered_data’. This allows us to inspect the data granularity per ticker without altering the original dataset prematurely.
Next, it calculates two key metrics for the filtered data: ‘total_entries’, which counts all rows for that ticker, and ‘non_null_entries’, which counts rows after dropping any with null values using the dropna() method. These metrics reveal both the volume and completeness of the data for the ticker. The condition checks if the total entries equal exactly 31 — a threshold likely set to match our required observation period, such as a monthly window — and if the total matches the non-null count, ensuring no missing values. The logical OR in the if statement flags the ticker for exclusion if either criterion fails: insufficient entries or any nulls present. If so, the symbol is appended to ‘symbols_to_exclude’, effectively marking it as unreliable for our model.
Once the loop completes, the code refines the ‘combined’ DataFrame by excluding all rows associated with the problematic tickers. This is done using boolean indexing with the negation of the ‘isin’ method, which creates a mask to retain only rows where the ticker is not in the exclusion list. This step streamlines the dataset to include only high-quality tickers, preserving the integrity needed for quantile-based calculations that underpin our trading strategy’s risk-adjusted signals.
Finally, the code prints the number of unique tickers remaining in the filtered ‘combined’ DataFrame using nunique(), providing a quick validation of the cleaning process. This output confirms the effective size of our working dataset, ensuring we proceed with a focused set of symbols that align with the precision demands of quantamental modeling.
Upon reviewing our dataset, we find that among more than 2,300 relevant stock symbols, just around 400 possess the complete set of necessary information across every specified timeframe. To address these gaps, alternative approaches might include:
1. Substituting yearly figures in place of quarterly ones when the latter are missing.
2. Relying on overall long-term debt totals when net long-term debt figures are absent.
3. Opting for basic earnings per share over diluted earnings per share.
It’s essential to exercise caution with these substitutes, since they may substantially distort the underlying metrics and, in turn, skew our overall evaluations.
Integrating Quotemedia’s Daily Stock Pricing Data
In the subsequent step, we fetch daily pricing details for over 400 stock symbols, spanning from June 2013 up to January 2021. Keep in mind that while the investment approach launches in January 2014, pulling in earlier records allows us to propagate forward any critical financial metrics as necessary.
stock_data = None
try:
stock_data = pd.read_pickle(’dailyPrice.pkl’)
except:
print(’Downloading from NasdaqDataLink’)
symbol_list = list(combined[’ticker’].unique())
fetched_data = nasdaqdatalink.get_table(’QUOTEMEDIA/PRICES’, ticker=symbol_list,
qopts={”columns”: [”ticker”, “date”, “adj_close”]},
date={’gte’: ‘2013-06-01’, ‘lte’: ‘2021-01-31’}, paginate=True)
stock_data = fetched_data.iloc[::-1].reset_index(drop=True)
ticker_counts = stock_data.groupby(’ticker’).size()
valid_symbols = ticker_counts[ticker_counts == 1930].index.tolist()
stock_data = stock_data[stock_data[’ticker’].isin(valid_symbols)]
stock_data.to_pickle(’dailyPrice.pkl’)
remaining_symbols = sorted(set(stock_data[’ticker’].unique()))
print(’Unique Tickers =’, combined[’ticker’].nunique())The process begins by attempting to load pre-saved stock data from a pickle file named ‘dailyPrice.pkl’ into the variable stock_data. This step prioritizes efficiency by reusing previously fetched data if available, avoiding redundant API calls that could slow down our model development pipeline.
If the pickle file is not found or cannot be loaded — perhaps due to it being the first run or an issue with the file — the code initiates a data download from Nasdaq Data Link. It first extracts a unique list of ticker symbols from a presumed DataFrame called ‘combined’, which likely aggregates our target universe of stocks for the strategy. Using the nasdaqdatalink library, it queries the ‘QUOTEMEDIA/PRICES’ table, specifying only the essential columns: ticker, date, and adj_close (adjusted closing price, which accounts for splits and dividends to provide a consistent time series for quantitative modeling). The query is bounded by a specific date range from June 1, 2013, to January 31, 2021, chosen to align with the historical period relevant to training and validating our quantile trading strategy, where we analyze price distributions and momentum signals. Pagination is enabled to manage large result sets efficiently, preventing memory overload during the fetch.
Once the raw data is retrieved, it is reordered chronologically by reversing the rows with iloc[::-1] — as the API often returns data in descending order — and resetting the index to create a clean, sequential DataFrame. This ensures the time series flows forward in time, which is crucial for subsequent steps in our model, such as calculating returns, quantiles, or risk metrics. To maintain data quality and completeness, the code then groups the data by ticker and counts the number of records per symbol. It identifies “valid” symbols as those with exactly 1930 entries, corresponding to the expected number of trading days in our date range (accounting for weekends and holidays), filtering out any incomplete series that could introduce bias or errors in quantile-based signal generation. The stock_data is then subsetted to include only these valid tickers, guaranteeing a balanced panel dataset for robust strategy backtesting.
Finally, the cleaned data is persisted back to ‘dailyPrice.pkl’ for future use, promoting reproducibility across model iterations. The code extracts and sorts the unique tickers from the final stock_data into remaining_symbols, reflecting any refinements from the validation step, and prints the total unique tickers from the original ‘combined’ DataFrame for verification. This output helps confirm the scope of our trading universe, ensuring alignment with the quantamental approach where we derive trading signals from quantile thresholds applied to a comprehensive set of equities.
After acquiring the essential fundamental metrics and daily pricing records, the process advances to assembling a unified table dubbed ‘fullDataSet’. This repository integrates every critical detail through iterative fusions of the sourced datasets, keyed by ticker identifiers.
dailyPrice.iloc[:5]This line of code, dailyPrice.iloc[:5], initiates a focused examination of the dataset by extracting the initial subset of rows using pandas’ integer-location-based indexing. Specifically, the iloc accessor enables zero-based positional selection, where the slice [:5] defines the range from the start (index 0) to just before index 5, thereby retrieving the first five rows of the DataFrame without altering the original structure. This operation flows the data sequentially from the full historical dataset into a compact preview, allowing us to verify the temporal order and column integrity of the price information right at the outset. By doing so, it establishes a reliable foundation for subsequent steps in the strategy, such as computing rolling statistics, deriving quantile-based signals, or aligning prices with fundamental indicators to inform trading decisions based on market regimes.
temp_result = dailyPrice.merge(combined, how=’left’, left_on=[’ticker’, ‘date’], right_on=[’ticker’, ‘filing_date’])
forward_filled = temp_result.fillna(method=”ffill”)
base_dataset = forward_filled.dropna()
lookup_table = base_dataset[[’date’, ‘ticker’, ‘adj_close’]]
extended_data = base_dataset.merge(lookup_table, how=’left’, left_on=[’per_end_date’, ‘ticker’], right_on=[’date’, ‘ticker’])
column_mapping = {’date_x’: ‘date’, ‘adj_close_x’: ‘adj_close’, ‘adj_close_y’: ‘per_end_date_adj_close’}
adjusted_columns = extended_data.rename(columns=column_mapping)
fullDataSet = adjusted_columns.drop(columns=[’date_y’])The process begins with merging the dailyPrice DataFrame, which contains ticker-specific adjusted closing prices and dates, with the combined DataFrame — likely holding filing details such as period-end dates — using a left join on ticker and date (from dailyPrice) to filing_date (from combined). This approach preserves all daily price records while attaching relevant filing information where matches exist, allowing us to associate market data with regulatory or financial event timelines essential for quantamental signals.
To address potential gaps in the merged data, such as missing filing details on non-event dates, the code applies forward filling to propagate the most recent values downward through the dataset. This method is particularly useful in financial time series, where the absence of new filings doesn’t invalidate prior information, ensuring continuity for modeling trading strategies that rely on persistent features like earnings or disclosure impacts. Following this, any rows with remaining null values — possibly from unmatched dates or incomplete records — are dropped to create a clean base_dataset, providing a reliable foundation free of artifacts that could skew quantile-based risk assessments or strategy simulations.
Next, a lightweight lookup_table is extracted from the base_dataset, retaining only the date, ticker, and adjusted close columns. This serves as a reference for efficiently retrieving price snapshots without redundant computations. The base_dataset is then extended by merging it with this lookup_table using a left join, aligning on the per_end_date (from the base) with date (from the lookup) and ticker. This step enriches the dataset with adjusted closing prices corresponding to specific period ends, enabling the incorporation of lagged or forward-looking price metrics crucial for quantile trading, where position sizing and entry/exit signals depend on historical price levels relative to filing periods.
Finally, to streamline the structure for downstream model development, the code renames the merge-generated columns — such as date_x to date, adj_close_x to adj_close, and adj_close_y to per_end_date_adj_close — for clarity and consistency, reflecting the primary date and price alongside the period-specific variant. The superfluous date_y column, an artifact of the merge, is dropped, yielding the fullDataSet ready for further transformations. This refined dataset thus supports the quantamental framework by aligning price dynamics with fundamental events, facilitating the derivation of quantiles for strategy optimization and risk management.
Dataset Overview
The dataset used in this analysis includes information on more than four hundred publicly traded companies. It spans every trading day from January two thousand fourteen to January two thousand twenty one. For each trading session the data records the adjusted closing price of the stock along with the industry classification provided by Zacks. It also contains earnings per share values on a net basis for both basic and fully diluted calculations. In addition the dataset includes measures of total debt and net long term debt. Each company record is linked to the end date of the relevant quarterly reporting period together with the stock’s closing price on that date. Other financial characteristics such as the debt to equity ratio return on investment and overall market capitalization are also included.
Derived Daily Financial Indicators
Using the raw information described above three core indicators are constructed. These indicators vary on a daily basis and are designed to move in line with changes in stock prices while still reflecting underlying fundamentals.
Debt Relative to Market Capitalization
This indicator represents the relationship between a firm’s debt level and its market value. It is computed by taking total debt relative to total equity at the end of the reporting period and then adjusting this value by the ratio between the stock price at the period end and the current day’s closing price. This adjustment allows the measure to respond dynamically to daily price movements.
Earnings Multiple
The earnings multiple is calculated as the current day’s closing price divided by the most recent available diluted net earnings per share figure. When this ratio results in a negative value it is replaced with a very small positive constant to ensure numerical stability and comparability across firms.
Investment Yield Adjustment
This indicator combines profitability leverage and market valuation into a single measure. It starts with the return on investment reported at the period end and scales it using the combined value of net long term debt and market capitalization. This combined figure is then adjusted by the ratio between the current adjusted closing price and the closing price at the period end. The result is a yield based metric that evolves with daily price changes while remaining anchored to fundamental balance sheet information.
change_factor = fullDataSet[’per_end_date_adj_close’].div(fullDataSet[’adj_close’])
pe_values = fullDataSet[’adj_close’].div(fullDataSet[’diluted_net_eps’])
fullDataSet[’price_to_earnings’] = np.where(pe_values < 0, 0.001, pe_values)
fullDataSet[’mcap’] = fullDataSet.eval(’shares_out * adj_close’)
fullDataSet[’debt_to_mcap’] = fullDataSet[’tot_debt_tot_equity’].mul(change_factor)
denom_component = fullDataSet[’net_lterm_debt’] + fullDataSet[’mkt_val’].div(change_factor)
num_component = fullDataSet[’net_lterm_debt’] + fullDataSet[’mkt_val’]
adjustment_ratio = num_component.div(denom_component)
fullDataSet[’ret_on_invst’] = fullDataSet[’ret_invst’].mul(adjustment_ratio)These features enable us to segment securities into quantiles for strategy backtesting and signal generation, ensuring the model captures fundamental health alongside quantitative signals.
The process begins by computing a change_factor, which represents the ratio of the period-end adjusted close price to the standard adjusted close price. This factor normalizes price-related data across different reporting periods, accounting for splits, dividends, or other adjustments, so that subsequent metrics remain comparable over time and support accurate quantile bucketing in our trading model.
Next, the code calculates the raw price-to-earnings (P/E) values by dividing the adjusted close price by the diluted net earnings per share. To handle cases where earnings are negative — which could distort valuation signals in our quantamental framework — the P/E is then adjusted using a conditional assignment: if the value is negative, it is set to a small positive threshold of 0.001; otherwise, it retains the original value. This prevents extreme or undefined behaviors in downstream quantile analyses while preserving the economic intuition of earnings-based valuation.
Market capitalization (mcap) is then derived by multiplying the shares outstanding by the adjusted close price, providing a direct measure of company size that’s essential for scaling other metrics and forming size-based quantiles in our strategy.
The debt-to-market-cap ratio is computed by multiplying the total debt-to-total equity ratio by the change_factor. This adjustment aligns the leverage metric with the period-end price dynamics, ensuring it reflects current market conditions rather than historical book values, which is crucial for risk-adjusted quantile trading signals.
To refine the return on investment metric, the code first constructs two components for an adjustment ratio. The numerator combines net long-term debt with the market value, capturing the unadjusted investment base. The denominator mirrors this but divides the market value by the change_factor, effectively normalizing it to a consistent period-end basis. The resulting adjustment_ratio — numerator divided by denominator — scales the raw return on investment by this factor and assigns it to ret_on_invst. This step harmonizes the return metric across time periods, enhancing its reliability for quantile-based strategy decisions that prioritize consistent fundamental performance.
After calculating the financial ratios, the process moves to clustering and summarizing the data, allowing us to analyze the spread across these key metrics. *Crucially, this clustering must occur separately for each stock ticker.*
metrics_list = [’debt_to_mcap’, ‘ret_on_invst’, ‘price_to_earnings’]
stat_ops = [’mean’, ‘std’, ‘min’, ‘max’]
agg_mapping = {m: stat_ops for m in metrics_list}
by_ticker = source_data.groupby(’ticker’)
result_stats = by_ticker.agg(agg_mapping)
result_stats.describe()We begin by defining a list of core metrics — ‘debt_to_mcap’ for debt-to-market-cap ratio, ‘ret_on_invst’ for return on investment, and ‘price_to_earnings’ for the P/E ratio — these are selected because they capture fundamental aspects of company health and market perception, which are essential for sorting assets into performance quantiles in our trading approach. Next, we specify a set of statistical operations: ‘mean’, ‘std’, ‘min’, and ‘max’, which provide a comprehensive view of each metric’s central tendency, variability, and extremes over time or observations per security, allowing us to quantify stability and range for risk-adjusted quantile assignments.
To efficiently apply these operations, we create an aggregation mapping dictionary that assigns the full set of statistical operations to each metric in the list; this structure ensures that for every metric, we compute all four stats in a single pass, promoting computational efficiency when dealing with large financial datasets. The data flow then proceeds by grouping the source_data DataFrame by ‘ticker’, which organizes the rows by individual securities, as our strategy requires per-security insights to build factor models without mixing cross-sectional noise. We apply the aggregation using this mapping on the grouped data, producing a result_stats DataFrame where each row represents a ticker and columns hold the computed statistics for each metric — such as the mean debt-to-mcap or the standard deviation of returns — enabling us to derive robust inputs for quantile-based portfolio construction.
Finally, invoking describe() on result_stats generates descriptive statistics across all tickers for these aggregated values, revealing the overall distribution, quartiles, and skewness of the metrics’ summaries; this step is crucial for model validation, as it helps us understand how these fundamental-derived factors vary population-wide, informing thresholds for quantile binning and ensuring our trading signals are grounded in empirical patterns rather than outliers.
min_debt_equity_avg = 0.1
max_debt_equity = 50
min_return_invest = -50
max_return_invest = 200
min_pe = 0
max_pe = 1000
invalid_symbols = [
idx for idx in financial_stats.index
if (
(financial_stats.loc[idx, ‘price_to_earnings’][’min’] < min_pe or financial_stats.loc[idx, ‘price_to_earnings’][’max’] > max_pe) or
(financial_stats.loc[idx, ‘ret_on_invst’][’min’] < min_return_invest or financial_stats.loc[idx, ‘ret_on_invst’][’max’] > max_return_invest) or
(financial_stats.loc[idx, ‘debt_to_mcap’][’mean’] < min_debt_equity_avg or financial_stats.loc[idx, ‘debt_to_mcap’][’max’] > max_debt_equity)
)
]
entire_data = entire_data[~entire_data[’ticker’].isin(invalid_symbols)]
print(’Unique Tickers =’, entire_data[’ticker’].nunique())The process begins by establishing predefined thresholds for key financial indicators: a minimum average debt-to-market-cap ratio of 0.1 and a maximum of 50 to capture reasonable leverage levels; a return on investment range from -50% to 200% to exclude outliers that might skew profitability assessments; and a price-to-earnings (P/E) ratio bounded between 0 and 1000 to focus on viable valuation metrics without extremes from distressed or speculative assets. These bounds are chosen to align with typical market behaviors observed in our trading universe, preventing the inclusion of data that could distort quantile-based portfolio constructions or risk-adjusted returns.
Next, the code generates a list of invalid symbols by iterating over the index of the financial_stats DataFrame, which presumably holds aggregated statistics for each ticker’s metrics. For each symbol, it evaluates three conditional checks to identify anomalies: first, whether the minimum or maximum P/E values fall outside the defined range, which would indicate inconsistent earnings stability; second, if the return on investment’s minimum or maximum exceeds the thresholds, signaling potential data errors or non-representative performance; and third, if the mean debt-to-market-cap is below the minimum or the maximum surpasses the upper limit, highlighting leverage profiles that deviate from sustainable norms. A symbol is flagged as invalid if any of these conditions hold true, as such outliers could introduce noise into our quantamental features and compromise the strategy’s ability to identify tradable quantiles effectively.
The data flow then proceeds to cleanse the primary dataset by excluding these invalid symbols: it creates a filtered version of entire_data, retaining only rows where the ticker is not in the invalid_symbols list, thereby preserving the integrity of our historical price and volume data for modeling. This step ensures that our quantile trading strategy, which relies on segmented financial characteristics to form portfolios, operates on a high-quality subset of symbols. Finally, the code outputs the number of unique tickers remaining in the filtered dataset, providing a quick validation metric to confirm the scale of our working universe post-filtration and supporting iterative refinements in the model development pipeline.
debt_mcap_series = fullDataSet[’debt_to_mcap’]
pe_ratio_series = fullDataSet[’price_to_earnings’]
fullDataSet[’debt_to_earnings’] = debt_mcap_series * pe_ratio_seriesthis code snippet processes key financial ratios from the dataset to derive a new feature that enhances our ability to evaluate company leverage in relation to profitability, ultimately supporting the ranking and selection of assets into trading quantiles based on risk-adjusted fundamentals.
The process begins by extracting two specific series from the fullDataSet, which contains comprehensive financial data across assets or time periods. First, we retrieve the debt_to_mcap series, representing the ratio of a company’s total debt to its market capitalization, a measure of financial leverage scaled by market value. This is assigned to debt_mcap_series for direct manipulation. Next, we pull the price_to_earnings series, or PE ratio, which captures the market price per share relative to earnings per share, indicating how the market values a company’s earnings power. This is stored in pe_ratio_series, ensuring both are readily available as aligned pandas Series objects, typically indexed by assets or dates to maintain data integrity.
The core computation then follows: by multiplying debt_mcap_series with pe_ratio_series element-wise, we generate the debt_to_earnings ratio. This multiplication works because the PE ratio effectively converts market cap back to earnings (since market cap = PE * earnings), so debt_to_mcap * PE yields debt divided by earnings — a direct leverage metric against operational profitability. The result is assigned back to fullDataSet as a new column, ‘debt_to_earnings’, enriching the dataset with this derived fundamental indicator. This step enables downstream quantile bucketing in our strategy, where higher debt-to-earnings values might signal riskier profiles for contrarian or value-based trades, aligning with the model’s goal of systematic, data-driven portfolio construction.
df = fullDataSet
metrics = [
(’change_debt_to_mcap’, ‘debt_to_mcap’),
(’change_ret_on_invst’, ‘ret_on_invst’),
(’change_price_to_earnings’, ‘price_to_earnings’),
(’change_debt_to_earnings’, ‘debt_to_earnings’)
]
for new_name, base_col in metrics:
df[new_name] = df[base_col].pct_change()We begin by assigning the full dataset to a DataFrame variable df, ensuring we’re working with the complete historical data that includes various financial ratios for stocks or other securities. This step sets the stage for feature engineering, where we transform static ratios into relative changes to better reflect evolving market conditions, a crucial aspect for identifying quantile-based trading signals.
Next, we define a list called metrics that pairs new column names with their corresponding base columns from the dataset. Each tuple in this list specifies a descriptive name for the derived metric — such as ‘change_debt_to_mcap’ — alongside the original column like ‘debt_to_mcap’, which represents the debt-to-market-cap ratio. This structure organizes the transformations efficiently, focusing on four core ratios: debt-to-market-cap, return on investment, price-to-earnings, and debt-to-earnings. By selecting these specific metrics, we’re emphasizing changes in leverage, profitability, valuation, and coverage ratios, which are foundational for quantamental analysis as they help quantify how fundamentals evolve over time, enabling the model to detect patterns for sorting assets into performance quantiles.
The loop then iterates over each pair in the metrics list, unpacking the new name and base column for processing. For every iteration, it calculates the percentage change of the base column using the pct_change() method on the DataFrame series, which computes the relative difference between consecutive periods (e.g., (current value — previous value) / previous value), and assigns this result to a new column in df under the specified name. This sequential application ensures that the dataset is augmented in place with these change-based features without altering the originals, allowing the model to incorporate both absolute levels and their variations. Ultimately, this process enriches the dataset for downstream quantile strategy development, where these percentage changes can signal entry or exit points by highlighting accelerating or decelerating trends in fundamental health.
def standardize_column(data, col, ticker_groupby):
grouped_col = ticker_groupby[col]
centered = data[col] - grouped_col.transform(’mean’)
scaled = centered / grouped_col.transform(’std’)
return scaled
groupby_ticker = fullDataSet.groupby(’ticker’)
fullDataSet[’z_debt_to_mcap’] = standardize_column(fullDataSet, ‘debt_to_mcap’, groupby_ticker)
fullDataSet[’z_ret_on_invst’] = -standardize_column(fullDataSet, ‘ret_on_invst’, groupby_ticker)
fullDataSet[’z_price_to_earnings’] = standardize_column(fullDataSet, ‘price_to_earnings’, groupby_ticker)
fullDataSet[’z_debt_to_earnings’] = standardize_column(fullDataSet, ‘debt_to_earnings’, groupby_ticker)
fullDataSet[’z_change_debt_to_mcap’] = standardize_column(fullDataSet, ‘change_debt_to_mcap’, groupby_ticker)
fullDataSet[’z_change_ret_on_invst’] = standardize_column(fullDataSet, ‘change_ret_on_invst’, groupby_ticker)
fullDataSet[’z_change_price_to_earnings’] = standardize_column(fullDataSet, ‘change_price_to_earnings’, groupby_ticker)
fullDataSet[’z_change_debt_to_earnings’] = standardize_column(fullDataSet, ‘change_debt_to_earnings’, groupby_ticker)Standardization here transforms raw financial ratios into z-scores, which center the data around zero with unit variance within each stock’s (ticker’s) historical observations. This approach is essential because financial metrics like debt-to-market-cap or return on investment can vary widely due to company-specific scales, economic cycles, or reporting differences; by normalizing per ticker, we isolate relative deviations from a stock’s own historical norms, facilitating cross-sectional comparisons and quantile assignments that drive trading decisions, such as identifying undervalued or high-momentum opportunities.
The process begins with defining a reusable function, standardize_column, which takes the dataset, a target column name, and a groupby object as inputs. Within the function, it first extracts the grouped series for the specified column using the provided groupby object. Then, it computes the centered values by subtracting the group-wise mean — calculated via transform(‘mean’), which broadcasts the mean back to each row within its group — from the original column values in the dataset. This centering removes the baseline average for each ticker, highlighting deviations. Next, it scales these centered values by dividing by the group-wise standard deviation, again using transform(‘std’) to ensure the scaling factor aligns with each group’s variability. The result is a z-score series where values above zero indicate above-average performance relative to the ticker’s history, and below zero indicate underperformance, all while preserving the original data structure.
Following the function definition, the code sets up a groupby operation on the full dataset by the ‘ticker’ column, creating groupby_ticker to define the grouping scope for all subsequent standardizations. It then applies standardize_column sequentially to a series of financial metrics, assigning the results to new columns prefixed with ‘z_’ in the dataset. For instance, ‘debt_to_mcap’ is standardized directly to ‘z_debt_to_mcap’, capturing how a company’s leverage deviates from its historical norm. Similarly, metrics like ‘price_to_earnings’, ‘debt_to_earnings’, and their change variants (e.g., ‘change_debt_to_mcap’) are processed to normalize both levels and momentum signals, which are crucial for quantile strategies that sort stocks into buckets based on these traits to predict outperformance. A notable adjustment occurs for ‘ret_on_invst’: the standardization is negated when creating ‘z_ret_on_invst’, inverting the z-scores so that higher historical returns yield more negative (or less positive) values if the intent is to penalize or reinterpret high returns in the model — perhaps to emphasize stability or contrarian signals in the trading logic. Through this pipeline, the dataset evolves with these z-score features, ready for aggregation into composite scores or direct use in quantile binning to inform buy/sell decisions in the strategy.
date_mask = fullDataSet[’date’] > ‘2014’
fullDataSet = fullDataSet.loc[date_mask]
fullDataSet = fullDataSet.set_index(’date’)We begin by creating a boolean mask, date_mask, which identifies rows in the fullDataSet where the ‘date’ column exceeds the year 2014. This step is crucial because it isolates post-2014 observations, allowing us to concentrate on a timeframe that captures contemporary economic and market behaviors essential for building robust quantile-based trading signals, avoiding dilution from outdated data that might skew model performance.
Next, we apply this mask using loc[] to subset the fullDataSet, retaining only the rows where the condition is true. By doing so, we streamline the dataset to include solely the pertinent temporal scope, which enhances computational efficiency during subsequent model training and backtesting phases of our strategy. This filtering ensures that our quantamental features — derived from factors like volatility or momentum — are grounded in a consistent, recent historical window, directly supporting the quantile regression techniques we employ to stratify assets for trading decisions.
Finally, we set the ‘date’ column as the index of the filtered fullDataSet via set_index(‘date’). This transformation pivots the dataset into a time-indexed structure, which is foundational for our workflow as it enables seamless chronological slicing, resampling, and alignment of features in pandas DataFrames. In quantile trading, where timing is paramount for signal generation and risk management, having dates as the index naturally facilitates operations like rolling window calculations or merging with external market data, ultimately fortifying the integrity of our model’s predictive outputs.
combo_factors = {
‘z_debt_to_mcap’: -1,
‘z_ret_on_invst’: 1,
‘z_price_to_earnings’: -1
}
fullDataSet[’z_score_combo’] = sum(fullDataSet[key] * combo_factors[key] for key in combo_factors)The process begins by defining a dictionary called combo_factors, which serves as a weighting scheme for three key z-scored factors derived from fundamental data: ‘z_debt_to_mcap’ (z-score of debt-to-market-cap ratio), ‘z_ret_on_invst’ (z-score of return on investment), and ‘z_price_to_earnings’ (z-score of price-to-earnings ratio). These weights — -1 for ‘z_debt_to_mcap’ and ‘z_price_to_earnings’, and +1 for ‘z_ret_on_invst’ — are chosen to reflect the directional impact of each factor on overall stock attractiveness: negative weights invert the undesirable high-debt or high-valuation signals, while the positive weight amplifies the beneficial high-return signal, ensuring the composite score favors fundamentally stronger companies within our trading framework.
The data flow then proceeds by computing the combined z-score directly within the fullDataSet DataFrame, which holds our pre-processed universe of stock data with columns already containing these individual z-scores. For each stock row in the dataset, the code iterates over the keys in combo_factors, multiplies the corresponding z-score value from fullDataSet by its assigned weight, and sums these products across all three factors. This weighted summation creates a single, normalized metric — z_score_combo — that captures the interplay of debt burden, profitability efficiency, and valuation reasonableness, allowing us to rank stocks into quantiles for strategy implementation, such as overweighting top-decile performers in long positions. By embedding this calculation inline, the model efficiently transforms disparate signals into a unified score that drives quantile-based trading decisions without intermediate variables.
treasury_bill_data = nasdaqdatalink.get(”FRED/DTB3”)
ted_spread_series = nasdaqdatalink.get(”FRED/TEDRATE”)
interim_sum = ted_spread_series.add(treasury_bill_data)
forward_filled_values = interim_sum.fillna(method=”ffill”)
clean_series = forward_filled_values.dropna()
final_adjusted = clean_series.sub(1.0)The process begins by fetching two essential datasets from the FRED economic database via the Nasdaq Data Link API: the 3-month Treasury Bill secondary market rate (DTB3), which serves as a benchmark for risk-free rates, and the TED Spread (TEDRATE), a widely used measure of interbank credit risk derived from the difference between LIBOR and the Treasury Bill rate. These series are obtained as pandas DataFrames or Series, aligned by date, to capture historical fluctuations in short-term funding costs and credit spreads, which are critical inputs for modeling tail risks in quantile strategies.
Next, the code computes an interim sum by adding the TED Spread series to the Treasury Bill data. This operation effectively reconstructs an approximation of the LIBOR rate, since the TED Spread is inherently the LIBOR minus the Treasury Bill rate; adding them back yields a composite series that reflects unsecured interbank lending rates adjusted for the risk-free component. This step is performed element-wise on the aligned time indices to ensure temporal consistency, allowing the model to track how credit risk premiums evolve alongside baseline yields, which is vital for quantile regression techniques that stratify assets based on conditional distributions.
To handle any missing values that may arise from data gaps or misalignment in the source series — common in economic datasets — the interim sum is forward-filled, propagating the last valid observation forward to maintain continuity without introducing artificial trends. Any remaining NaN values, such as those at the series outset where no prior data exists, are then dropped to yield a clean, contiguous time series. This preprocessing ensures the data is suitable for statistical modeling, preserving the integrity of quantile estimates by avoiding interpolation biases that could distort extreme value predictions.
Finally, the clean series is adjusted by subtracting 1.0 from each value, which normalizes the reconstructed rates to a scale centered around zero or aligned with specific model assumptions, such as percentage point adjustments for comparability across instruments. This final_adjusted series thus provides a polished feature ready for integration into quantamental frameworks, where it can help derive quantile thresholds for trading decisions, such as entering positions when credit spreads signal heightened tail risks.



