Mastering Portfolio Optimization: Balancing Risk and Return

Techniques and Strategies for Maximizing Investment Returns

Mar 22, 2025

∙ Paid

Portfolio optimization is a critical component of modern investment management, focusing on the strategic allocation of assets to achieve the best possible balance between risk and return. This process involves selecting a mix of investment assets that maximizes an investor’s expected returns for a given level of risk or, conversely, minimizes risk for a given level of expected return. The foundation of portfolio optimization lies in Markowitz’s Modern Portfolio Theory (MPT), which introduces the concept of diversification to reduce risk. By carefully analyzing the correlations between asset returns, investors can construct portfolios that mitigate unsystematic risk and improve overall performance.

Recent advancements in computational finance and data analytics have significantly enhanced the techniques and tools available for portfolio optimization. Sophisticated models now incorporate various constraints and objectives, such as minimizing drawdowns, achieving specific liquidity targets, or adhering to regulatory requirements. Techniques like mean-variance optimization, factor models, and machine learning algorithms provide robust frameworks for optimizing portfolios under complex market conditions. This article delves into these methods, illustrating their practical applications through simulations and real-world examples, and highlighting how investors can leverage these strategies to achieve superior investment outcomes.

Link to download source code at the end of this article.

Onepagecode is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Imports necessary Python packages for analysis

# Imports from Python packages.
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
from matplotlib.lines import Line2D
import multiprocessing as mp
import seaborn as sns
import pandas as pd
import numpy as np
import pygmo as pg
import os
import time
from numba import jit
from scipy.stats import norm
from datetime import timedelta
from functools import partial
from itertools import chain, combinations

The given code snippet imports various Python libraries and modules to enhance a Python script or program. It includes matplotlib, which is a plotting library used for creating visualizations such as charts, graphs, and histograms. The multiprocessing module supports concurrent execution using processes instead of threads, while seaborn, built on top of matplotlib, is utilized for creating more attractive and informative statistical graphics.

Pandas is a library for data manipulation and analysis, especially with labeled data structures like DataFrames. Numpy, the fundamental package for scientific computing, provides support for large, multi-dimensional arrays and matrices. Pygmo, a Python library, aids in general-purpose global optimization using evolutionary algorithms. The os module offers a portable way to use operating system-dependent functionalities, and the time module includes functions related to time.

Additionally, numba acts as a Just-In-Time (JIT) compiler for Python functions to enhance performance. The scipy.stats submodule from SciPy is used for statistical functions and probability distributions. For working with date and time, the datetime module is employed. Functools provides higher-order functions and operations on callable objects, whereas itertools offers efficient looping constructs.

By importing these libraries, the code gains access to a wide range of functionalities necessary for data processing, visualization, optimization, parallel processing, statistical analysis, and computational efficiency. Each library serves a specific purpose, collectively offering a versatile toolkit for various data science and scientific computing tasks.

Imports various functions for financial operations

# Imports from FinanceOps.
import diversify
from portfolio_utils import (normalize_weights, weighted_returns,
    fix_correlation_matrix, check_correlation_matrix)
from returns import max_drawdown, max_pullup
from stats import normal_prob_loss, normal_prob_less_than
from utils import linear_map

This code snippet imports various functions and modules from different files, including diversify, portfolio_utils, returns, stats, and utils, each providing specific functionalities related to financial operations and calculations. The diversify module primarily contains functions related to diversification. Portfolio_utils offers functions for managing portfolios, such as normalizing weights, calculating weighted returns, fixing correlation matrices, and checking correlation matrices. The returns module focuses on calculating financial indicators like maximum drawdown and maximum pullup. The stats module is used to compute statistics such as the normal probability of loss and normal probability less than specific values. Lastly, the utils module consists of general utility functions like linear mapping. These imports are crucial for modularizing finance-related operations, effectively organizing the codebase, reducing redundancy, and facilitating a structured approach to developing financial operations.

Imports SimFin data and functions

# Imports from SimFin.
import simfin as sf
from simfin.names import (TOTAL_RETURN, CLOSE, VOLUME, TICKER,
                          PSALES, DATE)
from simfin.utils import BDAYS_PER_YEAR

This code snippet imports essential functions and variables from the SimFin library, facilitating access to financial data, such as stock prices, fundamental data, and other financial metrics. The imported variables — including TOTAL_RETURN, CLOSE, VOLUME, TICKER, PSALES, and DATE — represent the names of columns in a financial dataset, enabling easy reference to specific columns. Additionally, the variable BDAYS_PER_YEAR likely stores the number of business days in a year, a common parameter in financial calculations.

Utilizing these imported functions, variables, and dataset names simplifies financial analyses, calculations, and data manipulations, contributing to more efficient and cleaner code. Furthermore, employing meaningful variable names enhances code readability and mitigates potential errors that could arise from manual entry of column names or other data-related details.

Generates random numbers with a seed

# Random number generator.
# The seed makes the experiments repeatable. In the year 1965
# the scientist Richard Feynman was awarded the Nobel prize in
# physics for discovering that this particular number was the cause
# of "The Big Bang" and all that matters in the entire universe.
rng = np.random.default_rng(seed=81680085)

The provided code snippet initializes a random number generator using the NumPy library with a specific seed value. Using a seed value makes the generation of random numbers reproducible, ensuring that if you run the same code with the same seed multiple times, you will get the same sequence of random numbers. This reproducibility is particularly useful for debugging or testing code that involves randomness, as it allows the recreation of the same random conditions.

In this instance, the seed value chosen is 81680085. A comment within the code humorously attributes the cause of The Big Bang and all universal matter to this specific number. While not scientifically accurate, this playful remark underscores the significance of the seed in random number generation.

Creates directory for storing plots

# Create directory for plots if it does not exist already.
path_plots = 'plots/portfolio_optimization/'
if not os.path.exists(path_plots):
    os.makedirs(path_plots)

This code snippet ensures that a directory for storing plots related to portfolio optimization exists. It checks if the specified directory path (‘plots/portfolio_optimization/’) is already present using the os.path.exists() function. If the directory does not exist, it creates it using os.makedirs(). This step is crucial because it is necessary to confirm that the directory where files will be saved is available. By creating the directory if it doesn’t exist, this code prepares the required structure for storing the plots generated during the portfolio optimization process and ensures that no errors occur due to missing directories when attempting to save the plots.

Sets SimFin data directory and loads API key

# SimFin data-directory.
sf.set_data_dir('~/simfin_data/')

# SimFin load API key or use free data.
sf.load_api_key(path='~/simfin_api_key.txt', default_key='free')

This code snippet is from a financial data library called SimFin, which provides access to company financial data. The first line, sf.set_data_dir(‘~/simfin_data/’), sets the directory where SimFin data will be stored or accessed, in this instance directing it to ‘~/simfin_data/’. The second line, sf.load_api_key(path=’~/simfin_api_key.txt’, default_key=’free’), loads the API key required to access SimFin data. The specified file ‘~/simfin_api_key.txt’ contains this key, defaulting to a free access key if no key is available. Using this code is crucial for accessing SimFin data; setting the data directory ensures that the downloaded data is organized, while loading the API key is essential for authentication and access to the financial data provided by SimFin.

Disable offset on y-axis, format percentages, scientific notations

# Matplotlib settings.
# Don't write e.g. +1 on top of the y-axis in some plots.
mpl.rcParams['axes.formatter.useoffset'] = False

# Used to format numbers on plot-axis as percentages.
pct_formatter0 = FuncFormatter(lambda x, _: '{:.0%}'.format(x))
pct_formatter1 = FuncFormatter(lambda x, _: '{:.1%}'.format(x))

# Used to format numbers on plot-axis in scientific notation.
sci_formatter0 = FuncFormatter(lambda x, _: '{:.0e}'.format(x))
sci_formatter1 = FuncFormatter(lambda x, _: '{:.1e}'.format(x))

This code snippet involves configuring settings in Matplotlib for plotting purposes. It disables the offset notation on the y-axis of plots by setting axes.formatter.useoffset to False, ensuring that large numbers won’t display in scientific notation with an offset (e.g., 1e6). It also creates a couple of formatter functions to format numbers on plot axes: pct_formatter0 formats numbers as percentages with zero decimal places, whereas pct_formatter1 adds one decimal place to the percentages. Additionally, it generates formatter functions for scientific notation: sci_formatter0 formats numbers in scientific notation with zero decimal places, and sci_formatter1 applies one decimal place. These settings and formatters can be employed in Matplotlib to control the appearance of numbers on the axes, thereby enhancing the clarity and readability of plots for visualization and analysis.

Sets the plotting style to whitegrid

# Seaborn set plotting style.
sns.set_style("whitegrid")

This code configures the plotting style for Seaborn, a Python data visualization library, by setting it to whitegrid. By doing so, it ensures that plots have a white background accompanied by grid lines. Selecting a specific style for plots can greatly enhance their aesthetics and readability, making the visualizations more appealing and easier to interpret. Different styles can be chosen based on personal preference or the specific requirements of a project. The whitegrid style, in particular, offers a clean and structured background, aiding in the clear distinction of data points and other visual elements.

Helper Functions

Calculates one-period returns from stock prices

def one_period_returns(prices, future):
    """
    Calculate the one-period return for the given share-prices.
    
    Note that these have 1.0 added so e.g. 1.05 means a one-period
    gain of 5% and 0.98 means a -2% loss.

    :param prices:
        Pandas DataFrame with e.g. daily stock-prices.
        
    :param future:
        Boolean whether to calculate the future returns (True)
        or the past returns (False).
        
    :return:
        Pandas DataFrame with one-period returns.
    """
    # One-period returns plus 1.
    rets = prices.pct_change(periods=1) + 1.0
    
    # Shift 1 time-step if we want future instead of past returns.
    if future:
        rets = rets.shift(-1)

    return rets

The code defines a function designed to calculate one-period returns from the provided share prices. The input is a pandas DataFrame named prices, which houses the historical prices, and a boolean future that decides if the calculation targets future returns (True) or past returns (False).

The function begins by calculating the percentage change in prices using the DataFrame’s pct_change method, subsequently adding 1 to these values. This results in one-period returns, where, for instance, a value of 1.05 indicates a 5% gain and 0.98 indicates a 2% loss. If the future parameter is set to True, it proceeds to shift the returns one step backward using the shift method, which helps in projecting future returns based on current prices.

The final output of this function is a pandas DataFrame containing the computed one-period returns. This functionality is particularly valuable in finance and investment analysis, facilitating quick assessments of investment returns from historical data. Additionally, the capability to calculate future returns aids in making informed investment decisions and evaluating the potential performance of various assets.

Code to load daily US stock share prices using SimFin

%%time
# Use our custom version of the SimFin stock-hub to load data.
hub = sf.StockHub(market='us', refresh_days_shareprices=100)

# Download and load the daily share-prices for US stocks.
df_daily_prices = hub.load_shareprices(variant='daily')

This code snippet demonstrates how to use the SimFin library to load daily share price data for US stocks. Initially, the %time command, a special Jupyter Notebook command, is utilized to display the time taken to execute a particular cell. Subsequently, a StockHub object is created from the SimFin library, configured for the US market with a setting to refresh share prices every 100 days. Following this, the StockHub object uses the load_shareprices method, specifying the variant=’daily’ parameter to load the daily share price data for US stocks. This approach is practical for fetching and using financial data for analysis or research purposes. By employing SimFin, users gain access to a comprehensive range of financial data, including balance sheets, income statements, and share price data, which can be utilized for various financial analyses, modeling, and decision-making processes. Moreover, the %time magic command records the time taken to load this data, enabling users to monitor the performance of data loading operations efficiently.

Thanks for reading! This post is public so feel free to share it.

%%time
# Calculate valuation signals such as P/Sales.
# Automatically downloads the required financial statements.
df_val_signals = hub.val_signals()

This code segment employs the val_signals() function from the hub module to calculate valuation signals like P/Sales using financial data automatically downloaded by the function. The %%time magic command is included to measure and display the execution time of this operation. The val_signals() function likely retrieves financial statement data, such as revenue and market capitalization, to calculate valuation metrics like the price-to-sales ratio. This process is automated and efficient, eliminating the need for manually gathering or processing the data. The result is a DataFrame (df_val_signals) that contains the calculated valuation signals for further analysis and decision-making. By using this code, time and effort are saved through the automation of fetching financial data and deriving valuation signals, thereby enabling quicker and more informed analysis and decision-making based on these metrics.

Prepare and clean daily stock data

# Use the daily "Total Return" series which is the stock-price
# adjusted for stock-splits and reinvestment of dividends.
# This is a Pandas DataFrame in matrix-form where the rows are
# time-steps and the columns are for the individual stock-tickers.
daily_prices = df_daily_prices[TOTAL_RETURN].unstack().T

# Remove rows that have very little data. Sometimes this dataset
# has "phantom" data-points for a few stocks e.g. on weekends.
num_stocks = len(daily_prices.columns)
daily_prices = daily_prices.dropna(thresh=int(0.1 * num_stocks))

# Remove the last row because sometimes it is incomplete.
daily_prices = daily_prices.iloc[0:-1]

# Show it.
daily_prices.head()

This code processes a pandas DataFrame representing daily stock prices in total return format. Initially, it extracts the Total Return series from the DataFrame, reshaping the data into a matrix where each row corresponds to a time-step and each column represents an individual stock ticker. Subsequently, it removes rows (time-steps) that contain insufficient data by dropping rows with NaN values in more than 10% of the columns. Additionally, it eliminates the last row, as this row may be incomplete due to the timing of data retrieval or recording. The end result is a cleaned and processed DataFrame of daily total return stock prices, ready for further analysis or visualization. This code is vital as it ensures the data is in the appropriate format for downstream analysis and excludes any incomplete or unreliable data points that might negatively impact the analysis.

Thank you for reading Onepagecode. This post is public so feel free to share it.

# Daily stock-returns calculated from the "Total Return".
# We could have used SimFin's function hub.returns() but
# this code makes it easier for you to use another data-source.
# This is a Pandas DataFrame in matrix-form where the rows are
# time-steps and the columns are for the individual tickers.
daily_returns_all = one_period_returns(prices=daily_prices, future=True)

# Remove empty rows (this should only be the first row).
daily_returns_all = daily_returns_all.dropna(how='all')

# Show it.
daily_returns_all.head()

This code snippet calculates the daily stock returns from the Total Return data provided using a function named one_period_returns. This function computes the daily returns based on the given daily prices, with the parameter future=True indicating the calculation of returns from today to tomorrow. After the returns are calculated, the code removes any empty rows, primarily focusing on the first row, to clean up any invalid or incomplete data resulting from the calculations. The code then displays the first few rows of the DataFrame that contain the calculated daily returns for each ticker, with columns representing individual stocks and rows representing time steps. This process is useful because it converts the total return data into daily returns, facilitating easier analysis and comparison of different stocks’ performance on a daily basis. By providing daily return data, investors can gain a better understanding of the volatility and performance of individual stocks or their portfolio.

Retrieve list of all stock-tickers

# All available stock-tickers.
all_tickers = daily_prices.columns.to_list()

This code snippet extracts all the stock tickers from a DataFrame named daily_prices and stores them in a list named all_tickers. Stock tickers are unique symbols assigned to publicly traded companies for identification. The .columns attribute in pandas refers to the column labels in a DataFrame, and the .to_list() method converts these column labels into a Python list. This extraction is useful for obtaining a comprehensive list of all stock tickers present in a dataset or DataFrame. With all the tickers in a list, it becomes easier to iterate over them for further analysis, data manipulation, or visualization purposes.

Identify low median market-cap tickers

# Find tickers whose median daily trading market-cap < 1e6
daily_trade_mcap = df_daily_prices[CLOSE] * df_daily_prices[VOLUME]
mask = (daily_trade_mcap.median(level=0) < 1e7)
bad_tickers1 = mask[mask].reset_index()[TICKER].unique()

This code snippet filters out ticker symbols based on their median daily trading market capitalization being less than $1 million. First, it calculates the daily trading market capitalization for each ticker by multiplying the closing price and the trading volume from the provided DataFrame df_daily_prices. Next, it determines the median of these daily trading market capitalizations by grouping the values by ticker symbol and finding the median for each group.

Following this, the code creates a mask to identify tickers whose median daily trading market capitalization is less than $1 million. Then, using this mask, it filters out the ticker symbols that meet the condition and stores them in the bad_tickers1 variable. To avoid duplicates, it finally extracts unique ticker symbols from these filtered results.

This code is particularly useful in financial analysis or stock market studies to identify stocks with relatively smaller daily trading market capitalizations, providing a basis for further analysis or investment decisions. By identifying these bad tickers, one can refine investment choices or enhance risk assessment strategies.

Identifies volatile stocks with high returns

# Find tickers whose max daily returns > 100%
mask2 = (daily_returns_all > 2.0)
mask2 = (np.sum(mask2) >= 1)
bad_tickers2 = mask2[mask2].index.to_list()

This code snippet filters out tickers whose maximum daily returns exceed 100%. It achieves this by first creating a mask (mask2) that identifies tickers with daily returns greater than 2.0, equivalent to a 100% increase. It then checks if at least one ticker meets this condition by summing the True values in the mask. Finally, the code extracts the tickers that satisfy this criterion and stores them in a list called bad_tickers2.

This process proves useful in financial analysis or when dealing with stock market data, as it helps identify extreme price movements. By pinpointing tickers with maximum daily returns exceeding 100%, investors or analysts can investigate the reasons behind such movements and take appropriate actions, such as conducting detailed analyses, implementing risk management strategies, or making informed investment decisions.

Identify tickers with few data points

# Find tickers which have too little data, so that more than 20%
# of the rows are NaN (Not-a-Number).
mask3 = (daily_returns_all.isna().sum(axis=0) > 0.2 * len(daily_returns_all))
bad_tickers3 = mask3[mask3].index.to_list()

The provided code aims to identify and extract stock symbols (tickers) with a significant amount of missing data in a dataset. Specifically, it focuses on tickers with more than 20% missing values (NaN). The process begins by calculating the number of NaN values in each column of the daily_returns_all DataFrame using the expression daily_returns_all.isna().sum(axis=0). This computation results in a Series where the index corresponds to column names (tickers) and the values represent the count of NaN values in each column.

Next, the code checks if the count of NaN values exceeds 20% of the total number of rows in the DataFrame. This is achieved with the condition (daily_returns_all.isna().sum(axis=0) > 0.2 * len(daily_returns_all)). The outcome of this condition is a boolean Series (mask) indicating whether each ticker has more than 20% missing values.

Finally, the tickers satisfying this condition are extracted. The index values (tickers) where the condition is True are converted to a list, which is then assigned to the variable bad_tickers3. This code is crucial for data cleaning and analysis as it identifies tickers with insufficient data, thereby ensuring the quality and reliability of subsequent analyses.

Identifies delisted stocks ending with ‘_old’

# Find tickers that end with '_old'.
# These stocks have been delisted for some reason.
bad_tickers4 = [ticker for ticker in all_tickers
                if ticker.endswith('_old')]

This code snippet filters through a list of stock tickers, all_tickers, to identify those that end with the suffix ‘_old’, indicating that these stocks have been delisted. The code uses a list comprehension to check each ticker in the all_tickers list and adds it to the bad_tickers4 list if the ticker name ends with ‘_old’. This approach is useful for identifying delisted stocks with this specific suffix, providing crucial information for further analysis or decision-making related to investment strategies or portfolio management.

List of tickers with erroneous data

# Tickers that we know have problematic / erroneous data.
bad_tickers5= ['FCAUS']

The code snippet creates a list named bad_tickers5, which contains tickers known to have problematic or erroneous data. In this instance, the list includes only one ticker, ‘FCAUS’. This list is intended for use in data processing pipelines or analysis scripts to filter out or handle the data linked with these specific tickers separately from others. By identifying and flagging such tickers in advance, it helps prevent errors or inaccuracies that could result from using data associated with them without appropriate handling.

Concatenates and filters bad tickers

# Concatenate the different bad tickers we have found.
bad_tickers = np.unique(np.concatenate([bad_tickers1, bad_tickers2,
                                        bad_tickers3, bad_tickers4,
                                        bad_tickers5]))
len(bad_tickers)

The code in question combines multiple arrays of bad tickers into a single array. Initially, it concatenates all the arrays together using the np.concatenate function to create one unified array. Following this, it utilizes np.unique to extract only the unique elements from the concatenated array, thus ensuring that no duplicate bad tickers remain. The final step involves calculating and returning the total number of unique bad tickers.

This code is particularly useful when working with multiple lists of bad data (in this case, bad tickers) and the necessity arises to consolidate them into a single list while eliminating any duplicates. By doing so, it simplifies the process of managing and analyzing multiple arrays by merging them into one unique list.

Calculate future 2–3 year stock returns

%%time
# Min years for long-term stock-returns.
min_years = 2

# Max years for long-term stock-returns.
max_years = 3

# Name used when plotting this data.
FUTURE_RETURN = f'{min_years}-{max_years} Year Return'

# Future mean 2-3 year stock-returns.
# If we had not filled the missing share-prices in `daily_prices`
# then we could have used SimFin's built-in function instead:
# hub.mean_log_returns(), which would also save the results to a
# disk-cache so it would be very quick to load again.
future_mean_ann_returns = \
    sf.mean_log_change(df=daily_prices, freq='b',
                       future=True, annualized=True,
                       min_years=min_years, max_years=max_years)

This code calculates the future mean annualized stock returns for a specified range of years, in this case, 2 to 3 years. First, it sets the minimum and maximum number of years, using variables min_years and max_years. It then defines a constant, FUTURE_RETURN, to label the data for later plotting.

The core calculation is performed by the sf.mean_log_change function from the Simfin library, which takes a dataframe containing daily stock prices (daily_prices), a frequency parameter (‘b’ to denote business days), and specifies the calculation of future and annualized returns within the range of 2 to 3 years.

The purpose of calculating these future mean annualized stock returns is to analyze potential long-term investment performance. By examining historical returns over the defined period, investors can make more informed decisions regarding future investments.

String constants for portfolio analysis

# String constants that make it easier to work with the data.

# Return types.
DAILY_RETURN = 'Daily Return'

# Portfolio types.
PORT_TYPE = 'Portfolio Type'
BHOLD = 'Buy & Hold'
REBAL = 'Rebalanced'
REBAL_PLUS = 'Rebalanced+'
THRES = 'Threshold'
ADAPT = 'Adaptive'
ADAPT_PLUS = 'Adaptive+'

# Portfolio types for Adaptive+ with various noise.
# The abbreviation "ADP" stands for "Adaptive Plus".
ADP_HEAVY_NOISE = 'Ad+ Heavy Noise'
ADP_PURE_NOISE = 'Ad+ Pure Noise'
ADP_CORR_NAIVE = 'Ad+ Corr. Naive'
ADP_CORR_ALL = 'Ad+ Corr. All'
ADP_CORR_EQUAL = 'Ad+ Corr. Equal'
ADP_CORR_INVERT = 'Ad+ Corr. Invert'

# Random trial number.
TRIAL = 'Trial'

# Statistic names.
NUM_STOCKS = 'Num. Stocks'
NUM_DATA_POINTS = 'Num. Data Points'
PORT_END_VAL = 'Portfolio End Value'
CASH_MEAN = 'Cash Mean'
ARI_MEAN = 'Arithmetic Mean'
GEO_MEAN = 'Geometric Mean'
STD = 'Std.Dev.'
SHARPE_RATIO = 'Sharpe Ratio'
MAX_DRAWDOWN = 'Max Drawdown'
MAX_PULLUP = 'Max Pullup'
MONTHS_LOSS = 'Months With Losses'

The code defines string constants to represent various types of return data, portfolio types, trial numbers, and statistical measures, including the number of stocks, arithmetic mean, and geometric mean, among others. These constants significantly enhance the readability, maintainability, and reliability of the code by eliminating the need to hardcode string values throughout. Utilizing these constants allows developers to refer to meaningful names instead of literal string values, thereby increasing the clarity and understanding of the code. This practice also minimizes the risk of typos or inconsistencies that might occur if string values were repeatedly written out in different parts of the code, promoting a more centralized and error-free approach.

Plotting number of available stocks

def plot_num_stocks(prices, figsize=figsize_small):
    """
    Plot the number of stocks we have data for at each time-step.
    
    :param prices:
        Pandas DataFrame with stock-prices. The columns are
        for the individual stocks and the rows are for time-steps.
        
    :param figsize:
        2-dim tuple with figure size.

    :return:
        Matplotlib Axis object.
    """
    # Create new figure.
    fig, ax = plt.subplots(figsize=figsize)
    
    # Plot the number of stocks for each time-step.
    # For each row in the returns-matrix, we count the number
    # of elements with values that aren't NaN (Not-a-Number).
    title = 'Number of Stocks Available'
    prices.notna().sum(axis=1).plot(title=title, ax=ax)
    ax.set_ylabel('Number of Stocks')

    # Adjust padding.
    fig.tight_layout()

    # Save the figure to disk.
    filename = os.path.join(path_plots, title + '.svg')
    fig.savefig(filename, bbox_inches='tight')

    return fig

The function plot_num_stocks is designed to create a visual representation of the number of stocks available at each time-step based on the provided stock prices data. It takes two parameters: prices, a Pandas DataFrame containing the stock prices data, and figsize (optional), a tuple specifying the size of the plot.

To achieve this, the function first creates a new figure and axis object using Matplotlib. It then calculates the number of stocks available for each time-step by checking for non-missing values (non-NaN values) in each row of the prices DataFrame. This number is plotted on the axis, with the y-axis representing the number of stocks and the x-axis representing the time-steps. The function sets the y-axis label as ‘Number of Stocks’ and adjusts the padding of the figure to ensure all elements fit properly.

Additionally, it saves the generated plot as an SVG file using the provided file path and title. Finally, the function returns the figure object. This code is useful for visualizing the availability of stock data at each time-step, aiding in understanding the completeness of the dataset and analyzing trends in the number of available stocks over time. It enables users to efficiently generate and save this plot for further analysis or reporting.

Thank you for reading Onepagecode. This post is public so feel free to share it.

plot_num_stocks(prices=daily_prices_org);

The code in question is likely designed to generate a plot that illustrates the number of stocks included in the dataset daily_prices_org over time. In this plot, the x-axis would represent time, while the y-axis would represent the number of stocks available each day in the dataset.

The purpose of visualizing this information is to observe the fluctuations in stock availability over time. This can aid in assessing the completeness and consistency of the dataset, as well as in identifying any patterns or anomalies in the availability of stock data. Such insights are valuable for ensuring the robustness of subsequent analyses that depend on this dataset.

Plot cumulative stock returns for multiple stocks

def plot_all_stock_traces(returns, logy=True, figsize=figsize_mid):
    """
    Plot the cumulative return for all stocks.

    :param returns:
        Pandas DataFrame with stock-returns. The columns are
        for the individual stocks and the rows are for time-steps.
        
    :param logy:
        Boolean whether to use a log-scale on the y-axis.

    :param figsize:
        2-dim tuple with figure size.

    :return:
        Matplotlib Axis object.
    """
    # Calculate the cumulative stock-returns.
    # These are normalized to begin at 1.
    returns_cumprod = returns.cumprod(axis=0)
    
    # Calculate the mean of the cumulative stock-returns.
    returns_cumprod_mean = returns_cumprod.mean(axis=1, skipna=True)
    
    # Create new figure.
    fig, ax = plt.subplots(figsize=figsize)

    # Create title.
    title = 'Normalized Cumulative Stock Returns'
    
    # Plot the cumulative returns for all stocks.
    # The lines are rasterized (turned into pixels) to save space
    # when saving to vectorized graphics-file.
    returns_cumprod.plot(color='blue', alpha=0.1, rasterized=True,
                         title=title, legend=False, logy=logy, ax=ax);

    # Plot dashed black line to indicate regions of loss vs. gain.
    ax.axhline(y=1.0, color='black', linestyle='dashed')

    # Plot the mean of the cumulative stock-returns as red line.
    returns_cumprod_mean.plot(color='red', ax=ax);

    # Set label for the y-axis.
    ax.set_ylabel('Cumulative Stock Return')

    # Save plot to a file.
    filename = os.path.join(path_plots, title + '.svg')
    fig.savefig(filename, bbox_inches='tight')

    return fig

The code defines a function called plot_all_stock_traces that generates a plot to visualize the cumulative return for all stocks provided in the input dataset. The function accepts three input parameters: returns, which is a Pandas DataFrame containing stock returns data with stock values in columns and time-steps in rows; logy, a boolean parameter specifying whether to use a logarithmic scale on the y-axis; and figsize, a tuple representing the figure size.

To generate the plot, the code first calculates the cumulative returns for each stock by taking the cumulative product of returns over time and then computes the mean of these cumulative stock returns. It sets up the plot with a title, axes labels, and appropriate styling, plotting the cumulative returns for each stock in blue with a transparency level of 0.1. The mean cumulative return is then overlaid in red. Additionally, a dashed black line is added at y=1 to distinguish between regions of loss and gain.

The output includes saving the generated plot as an SVG file titled Normalized Cumulative Stock Returns and returning the matplotlib figure object. This visualization tool is useful for comparing cumulative stock returns across different stocks over time, helping to understand overall trends and mean performance relative to individual stocks. The log scale option is particularly advantageous for visualizing data with a wide range of values, making this function essential for analyzing and presenting historical stock return data in a consolidated and comprehensive manner, aiding in investment decision-making processes.

Plots daily stock returns on logarithmic scale

plot_all_stock_traces(returns=daily_returns_org, logy=True);

This code likely calls a function named plot_all_stock_traces, using daily_returns_org as the input for the returns parameter while setting the logy parameter to True. The function probably generates a plot that displays the traces of multiple stock returns over a period of time. Setting logy=True means the y-axis of the plot is shown in a logarithmic scale, which is beneficial for visualizing data with a wide range of values, such as financial data.

The primary purpose of this code is to visualize the daily stock returns of multiple stocks simultaneously in a clear and meaningful way. By using logarithmic scaling on the stock traces, the plot facilitates the comparison of the relative performance of different stocks. This approach is particularly useful because, in linear scale plots, extreme values in some stocks can be visually overwhelming, making it hard to discern meaningful trends and comparisons.

Plot comparison of portfolio values

# Default pairs of portfolio types when plotting portfolio values.
default_pairs = [(REBAL, BHOLD), (REBAL_PLUS, REBAL),
                 (THRES, REBAL), (ADAPT, THRES),
                 (ADAPT_PLUS, ADAPT)]

# Pairs of portfolio types when plotting portfolio values.
# These are for the Adaptive+ robustness tests.
adp_pairs = [(ADAPT_PLUS, ADAPT),
             (ADP_HEAVY_NOISE, ADAPT),
             (ADP_PURE_NOISE, ADAPT),
             (ADP_CORR_NAIVE, ADAPT),
             (ADP_CORR_ALL, ADAPT),
             (ADP_CORR_EQUAL, ADAPT),
             (ADP_CORR_INVERT, ADAPT)]

def plot_compare_traces(df_port_val, num_stocks, test_name,
                        compare_pairs=default_pairs,
                        figsize=figsize_big):
    """
    Create a plot with several rows of sub-plots which compare
    the portfolio values for different portfolio types.

    :param df_port_val:
        DataFrame with the portfolio values obtained from
        the function `sim_many_trials`.

    :param num_stocks:
        Integer with the number of stocks in the portfolios.

    :param test_name:
        String with the test name.
        
    :param compare_pairs:
        List with tuple-pairs for portfolio types to compare.

    :param figsize:
        2-dim tuple with figure size.

    :return:
        Matplotlib Axis object.
    """
    # Create new plot with sub-plots.
    fig, axs = plt.subplots(nrows=len(compare_pairs),
                            squeeze=True, figsize=figsize)

    # For each pair of portfolio types to compare.
    for i, (port_type1, port_type2) in enumerate(compare_pairs):
        # Get the traces of the portfolio values for all trials.
        port_val1 = df_port_val.loc[port_type1, num_stocks].T
        port_val2 = df_port_val.loc[port_type2, num_stocks].T

        # Number of trials.
        num_trials = port_val1.shape[1]

        # Plot the portfolio value ratio.
        # The lines are rasterized (turned into pixels) to save
        # space when saving to vectorized graphics-file.
        port_val_ratio = port_val1 / port_val2
        port_val_ratio.plot(legend=False, color='blue', alpha=0.1,
                            rasterized=True, ax=axs[i]);
        
        # Set title.
        title = f'{port_type1} / {port_type2} ' + \
                f'({test_name}, {num_stocks} Stocks, ' + \
                f'{num_trials} Trials)'
        axs[i].set_title(title)
        
        # Don't show the axis labels.
        axs[i].set_ylabel(None)
        axs[i].set_xlabel(None)

        # Plot dashed black line to indicate regions of loss vs. gain.
        axs[i].axhline(y=1.0, color='black', linestyle='dashed')
        
        # Hide x-axis numbers.
        if i < len(compare_pairs)-1:
            axs[i].set_xticklabels([])

    # Adjust layouts for all sub-plots.
    for ax in axs:
        ax.xaxis.grid(True)
        ax.yaxis.grid(True)
        
    # Adjust padding.
    fig.tight_layout()
    
    # Save plot to a file?
    filename = f'{test_name} - Compare Portfolio Traces - {num_stocks} Stocks.svg'
    filename = os.path.join(path_plots, filename)
    fig.savefig(filename, bbox_inches='tight')

    return fig

The function plot_compare_traces is designed to generate a comprehensive plot with multiple sub-plots, which are used to compare the portfolio values for various types of portfolios. It accepts several parameters: df_port_val, a DataFrame containing portfolio values; num_stocks, indicating the number of stocks in the portfolios; test_name, the name of the test being conducted; compare_pairs, a list of tuple-pairs representing the portfolio types to be compared; and figsize, determining the size of the figure.

Upon invocation, the function initiates the creation of a new plot that includes sub-plots corresponding to the number of pairs specified in the compare_pairs list. It iterates over each pair, retrieves the relevant portfolio values from the DataFrame, computes the portfolio value ratio, and plots these values. Each sub-plot is then given appropriate titles, styling, and a grid to enhance readability and aesthetics.

Furthermore, the function saves the resulting plot as an SVG file. The filename for this file is uniquely generated using the test name and the number of stocks involved. The primary purpose of this function is to facilitate a visual comparison of the performance of different portfolios under varying scenarios. By graphically representing the portfolio value ratios for distinct pairs of portfolio types, it allows for easier analysis and interpretation, aiding in decision-making based on the visualized results.

Plot box-plots comparing portfolio types’ statistics

# Default order of portfolio types for the box-plots.
default_order = [BHOLD, REBAL, REBAL_PLUS, THRES, ADAPT, ADAPT_PLUS]

# Order of portfolio types for the Adaptive+ box-plots.
adp_order = [REBAL, ADAPT, ADAPT_PLUS,
             ADP_HEAVY_NOISE, ADP_PURE_NOISE,
             ADP_CORR_NAIVE, ADP_CORR_ALL,
             ADP_CORR_EQUAL, ADP_CORR_INVERT]

def plot_compare_stats_one(df_stats, stat_name, test_name,
                           num_trials, order=default_order):
    """
    Create a plot with a grid of box-plots that compare
    statistics for the different portfolio types.
    
    This function only compares a single performance statistic
    so you need to call the function with choices of `stat_name`
    in order to plot comparisons of different statistics.

    :param df_stats:
        DataFrame with the portfolio statistics obtained from
        the function `sim_many_trials`.

    :param stat_name:
        String with the name of the statistic to compare e.g. ARI_MEAN.

    :param test_name:
        String with the test name.
        
    :param num_trials:
        Integer with the number of trials with random portfolios.
        
    :param order:
        List with the order of portfolio types to use in box-plot.
        
    :return:
        Matplotlib Axis object.
    """
    
    # Create a grid of sub-plots.
    g = sns.FacetGrid(data=df_stats.reset_index(),
                      col=NUM_STOCKS, col_wrap=3)
    
    # Show a box-plot in each of the sub-plots.
    g.map(sns.boxplot, PORT_TYPE, stat_name, order=order)

    # Rotate x-axis labels.
    g.set_xticklabels(rotation=90)

    # Set the plot's title.
    title = f'{stat_name} ({test_name}, {num_trials} Trials)'
    g.fig.suptitle(title)
    
    # For all sub-plots.
    for ax in g.axes:
        # Remove the x-axis label 'Portfolio Type'.
        ax.set_xlabel('')
        
        # Enable grid-lines for both x- and y-axis.
        ax.xaxis.grid(True)
        ax.yaxis.grid(True)

        # Use percentages on the y-axis?
        if stat_name in [CASH_MEAN, MONTHS_LOSS,
                         MAX_DRAWDOWN, MAX_PULLUP]:
            ax.yaxis.set_major_formatter(pct_formatter0)
        
    # Adjust padding.
    g.tight_layout()

    # Save plot to a file.
    filename = f'{test_name} - Compare Stats - {stat_name}.svg'
    filename = os.path.join(path_plots, filename)
    g.savefig(filename, bbox_inches='tight')

    return g

The function plot_compare_stats_one is designed to create a grid of box plots for comparing statistics across different portfolio types. It accepts various parameters, including a DataFrame of portfolio statistics, the name of the statistic to compare, the test name, the number of trials, and the order of portfolio types. Using these inputs, the function generates a grid of box plots that visually compare the specified statistic across the different portfolio types.

The primary aim of the function is to facilitate the visualization and comparison of how various portfolio types perform based on selected statistics. By placing box plots side by side, the function allows for a clear view of the distribution of values, median, quartiles, and potential outliers for each portfolio type. Such visualization is crucial for comparing the performance of different strategies and aids in making informed, data-driven decisions.

The function employs Seaborn (sns) to generate the grid of sub-plots and to plot the box plots within them. It customizes the plot by rotating the x-axis labels, setting titles, adjusting the y-axis format for specific statistics, and saving the final plot to a file for further use.

This code is particularly useful for analysts and researchers who need to compare the performance of different portfolio strategies based on various metrics. By providing a clear visual representation, the function helps in understanding the differential performance of strategies and identifying the ones that are more effective under specific conditions.

Plot box-plots to compare portfolio statistics

def plot_compare_stats_all(df_stats, test_name, num_stocks,
                           num_trials, order=default_order):
    """
    Create a plot with a grid of box-plots that compare
    statistics for the different portfolio types.
    
    This function shows all performance statistics in a single
    plot, and it keeps the number of stocks fixed.

    :param df_stats:
        DataFrame with the portfolio statistics obtained from
        the function `sim_many_trials`.

    :param test_name:
        String with the test name.
        
    :param num_stocks:
        Integer with the number of stocks in the portfolios.
        
    :param num_trials:
        Integer with the number of trials with random portfolios.
        
    :param order:
        List with the order of portfolio types to use in the box-plots.
        
    :return:
        Matplotlib Axis object.
    """
    
    # Names of the statistics we want to plot.
    list_stat_names = [ARI_MEAN, STD, SHARPE_RATIO,
                       CASH_MEAN, MAX_DRAWDOWN, MAX_PULLUP]

    # Transform the Pandas DataFrame for use in the Seaborn plotting.
    id_vars = [PORT_TYPE, NUM_STOCKS, TRIAL]
    df_stats2 = df_stats[list_stat_names].reset_index()
    df_stats2 = df_stats2.melt(id_vars=id_vars)
    
    # Only use data-rows that have the given number of stocks.
    mask = (df_stats2[NUM_STOCKS] == num_stocks)
    df_stats2 = df_stats2[mask]
    
    # Ensure the DataFrame is not empty.
    assert len(df_stats2) > 0, 'DataFrame is empty'
    
    # Create a grid of sub-plots.
    g = sns.FacetGrid(data=df_stats2, col='variable',
                      col_wrap=3, sharey=False)

    # Show a box-plot in each of the sub-plots.
    g.map(sns.boxplot, PORT_TYPE, 'value', order=order)
    g.set_xticklabels(rotation=90)

    # Set the plot's main title.
    title = f'All Statistics ({test_name}, {num_stocks} Stocks, {num_trials} Trials)'
    g.fig.suptitle(title)
    g.fig.subplots_adjust(top=.9)
    
    # For each sub-plot.
    for stat_name, ax in zip(list_stat_names, g.axes):
        # Set the sub-plot title.
        ax.set_title(stat_name)

        # Set the axis-labels.
        ax.set_xlabel('')
        ax.set_ylabel('')
        
        # Enable grid-lines for both x- and y-axis.
        ax.xaxis.grid(True)
        ax.yaxis.grid(True)

        # Use percentages on the y-axis?
        if stat_name in [CASH_MEAN, MONTHS_LOSS,
                         MAX_DRAWDOWN, MAX_PULLUP]:
            ax.yaxis.set_major_formatter(pct_formatter0)

    # Save the figure to disk.
    filename = f'{test_name} - Compare Stats All.svg'
    filename = os.path.join(path_plots, filename)
    g.savefig(filename, bbox_inches='tight')

    return g

The function plot_compare_stats_all accepts a DataFrame called df_stats containing portfolio statistics as well as parameters such as test_name, num_stocks, num_trials, and a specific order for portfolio types. Its primary goal is to produce a visual comparison of statistics across different portfolio classifications through a grid of box-plots.

Initially, the function selects the relevant names of the statistics to be plotted and prepares the DataFrame by transforming it accordingly. It then filters the data based on the given number of stocks. Subsequently, the function creates a grid of subplots using seaborn’s FacetGrid, where each subplot contains a box-plot that displays the comparison according to the specified order of portfolio types. Titles, labels, gridlines, and pertinent formatting are set for each subplot based on the specific statistics being presented. Finally, the plot, complete with a comprehensive title, is saved to a file.

This function is crucial for visually comparing the performance metrics of various portfolio types. By detailing the portfolio performance across multiple metrics, it aids in thorough analysis and informed, data-driven decision-making.

Calculates geometric mean of a portfolio

def geo_mean(port_val):
    """
    Calculate the geometric mean for a portfolio's value
    from the start-point to the end-point.
    
    :param port_val:
        Pandas Series with the portfolio values at each time-step.
        
    :return:
        Float with the geometric mean.
    """
    return (port_val.iloc[-1] / port_val.iloc[0]) ** (1.0 / len(port_val))

The geo_mean function calculates the geometric mean of a portfolio’s value over a period of time using a Pandas Series, port_val, which represents the portfolio values at each time step. To compute the geometric mean, the function first divides the value at the end of the period (port_val.iloc[-1]) by the value at the beginning of the period (port_val.iloc[0]). It then raises the result to the power of 1 divided by the length of port_val, effectively taking the nth root of the result, where n is the total number of values in the portfolio series.

Onepagecode is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Geometric mean is particularly valuable in finance as it represents the average rate of return of an investment over multiple periods, accounting for the effects of compounding. This makes it more suitable than the arithmetic mean for data series that show exponential growth, such as financial data. By simplifying the calculation of the geometric mean for a given portfolio value series, this function helps in accurately assessing investment performance over time.

Calculate statistics for a stock portfolio

def statistics(port_val, cash, pullup_window=None):
    """
    Calculate various statistics for a time-series containing
    the values of a stock-portfolio through time.
    
    Note: The Sharpe Ratio is calculated from the 1-period
    returns using their arithemetic mean and std.dev. without
    a risk-free or benchmark return.
    
    :param num_stocks:
        Integer with the number of stocks in the portfolios.

    :param port_val:
        Pandas Series with the portfolio values at each time-step.
        
    :param pullup_window:
        Integer with window-length for the Max Pullup statistic.

    :return:
        Dict with various statistics.
    """
    # Calculate 1-period returns for the portfolio.
    rets_daily = port_val.pct_change(1) + 1.0
    
    # Calculate the fraction of months that have losses.
    # We count all 21-day periods, because there are about
    # 21 trading days on average per month.
    # Note that we don't +1 the returns because we would
    # just have to subtract them again.
    rets_monthly = port_val.pct_change(21).dropna()
    months_loss = (rets_monthly < 0.0).sum() / len(rets_monthly)

    # Arithmetic mean and std.dev. for the 1-period returns.
    ari_mean = rets_daily.mean()
    std = rets_daily.std()
    
    # Sharpe Ratio for the 1-period returns.
    if std>0.0:
        # The std.dev. is positive.
        sharpe_ratio = (ari_mean - 1.0) / std
    else:
        # The std.dev. is zero so we cannot divide with it.
        # Maybe we should set this to zero instead?
        sharpe_ratio = np.nan
    
    # Create dict with the statistics.
    data = \
    {
        NUM_DATA_POINTS: len(port_val),
        PORT_END_VAL: port_val.iloc[-1],
        CASH_MEAN: np.mean(cash),
        ARI_MEAN: ari_mean,
        GEO_MEAN: geo_mean(port_val=port_val),
        STD: std,
        SHARPE_RATIO: sharpe_ratio,
        MAX_DRAWDOWN: max_drawdown(df=port_val).min(),
        MAX_PULLUP: max_pullup(df=port_val, window=pullup_window).max(),
        MONTHS_LOSS: months_loss,
    }

    return data

This Python function, statistics(), computes various statistics for a time-series dataset representing the values of a stock portfolio over time. The function performs several key calculations. It calculates the 1-period returns for the portfolio values, computes the fraction of months that have losses, and determines the arithmetic mean and standard deviation for these 1-period returns. Additionally, it calculates the Sharpe Ratio using the derived arithmetic mean and standard deviation.

The function creates a dictionary containing essential statistics, including the number of data points in the portfolio values, the final value of the portfolio, and the mean value of cash holdings. It also includes the arithmetic mean of the 1-period returns, the geometric mean of portfolio values, the standard deviation of the 1-period returns, the Sharpe Ratio, the maximum drawdown, the maximum pullup within a specified window, and the fraction of months with losses. These statistics provide crucial insights into the performance and risk of the portfolio over time. Notably, the Sharpe Ratio serves as a measure of risk-adjusted return, helping investors assess the return of an investment relative to its risk.

We use this code to analyze and evaluate the historical performance of a stock portfolio. This analysis aids investors in making informed decisions about their investment strategies and optimizing their portfolio allocations.

Function to adjust portfolio weights

def adjust_weights_corr(returns, weights, corr_type='window',
                        corr_window=10, corr_invert=False,
                        corr_equal=0.1, corr_noise_scale=0.0):
    """
    Adjust portfolio weights to minimize correlation between assets.
    
    This function is intended for simulation and research purposes,
    as it can "cheat" and use the correlation of future returns.

    For real-world investments you should instead use the underlying
    function `diversify.adjust_weights`.
    
    :param returns:
        Pandas DataFrame with asset-returns.
        Rows are for the time-steps. Columns are for the assets.

    :param weights:
        Pandas DataFrame with asset-weights in the portfolio.
        Rows are for the time-steps. Columns are for the assets.
        
    :param corr_type:
        String with the correlation-type to use.
        'all' uses the correlations of the entire `returns` data.
        'window' only uses the correlation of a rolling window.
        'pure-noise' only uses random uniform noise.
        
    :param corr_window:
        Integer with the length of the rolling window which is
        used if `corr_type == 'window'`.
        
    :param corr_invert:
        Boolean whether to invert the correlation matrix.
        This is only used if `corr_type == 'window'`.
        This is used for robustness testing.
    
    :param corr_noise_scale:
        Float. If greater than 0.0 then random normal noise
        is added to the correlation matrix, where this arg
        is used as the std.dev. of the random numbers.
        This is only used if `corr_type == 'window'`.
        This is used for robustness testing.
    
    :return:
        Pandas DataFrame with the adjusted portfolio weights.
    """

    assert corr_type in ['all', 'window', 'pure-noise', 'equal']
    
    # Number of assets.
    num_assets = weights.shape[1]

    # Copy the original weights to a new DataFrame.
    weights_new = weights.copy()
    
    if corr_type == 'all':
        # Use correlation matrix for the entire time-series.
        corr = returns.corr().to_numpy()
    elif corr_type == 'equal':
        # All correlations are equal.
        corr = np.full(fill_value=corr_equal, shape=(num_assets, num_assets))

        # Ensure diagonal is all ones.
        np.fill_diagonal(corr, val=1.0)

    # For each time-step.
    for idx, weights_org in weights.iterrows():
        # Original portfolio weights for this time-step.
        weights_org = weights_org.to_numpy()

        # What kind of correlation matrix should be used?
        if corr_type == 'pure-noise':
            # All correlations are uniform random numbers
            # between -1 and 1.
            corr = rng.uniform(low=-1.0, high=1.0,
                               size=(num_assets, num_assets))

            # Ensure the correlation matrix is valid.
            fix_correlation_matrix(corr)

        elif corr_type == 'window':
            # Use correlation matrix for a rolling window.
            if corr_window > 0:
                # Use FUTURE data for the correlation matrix.
                idx_start = idx
                idx_end = idx + timedelta(days=corr_window)
            else:
                # Use PAST data for the correlation matrix.
                idx_start = idx + timedelta(days=corr_window)
                idx_end = idx

            # Calculate correlation for a slice of time-steps.
            returns_slice = returns[idx_start:idx_end]
            corr = returns_slice.corr()
            
            # Convert Pandas DataFrame to Numpy matrix.
            corr = corr.to_numpy()

            # Add random noise to the correlation matrix?
            if corr_noise_scale > 0.0:
                # New noise is added for every time-step, unlike
                # the noise for the future returns where the same
                # noise is added for all time-steps of each asset.
                noise_corr = rng.normal(loc=0.0, size=corr.shape,
                                        scale=corr_noise_scale)
                corr = corr + noise_corr

            # Invert the correlation matrix?
            if corr_invert:
                corr = -corr
                
            # Ensure the correlation matrix is valid.
            fix_correlation_matrix(corr)

        # Ensure the correlation matrix is valid.
        # This is not really necessary because we already ensure
        # the correlation matrix is valid in the above, but we
        # want to be completely certain everything is OK, in case
        # we experiment further and make a mistake somewhere.
        check_correlation_matrix(corr)

        # Adjust the portfolio weights for this time-step.
        weights_new.loc[idx] = \
            diversify.adjust_weights(weights_org=weights_org,
                                     corr=corr, step_size=1.0)
        
    return weights_new

The function adjust_weights_corr is designed to optimize portfolio weights by minimizing the correlation between assets. It takes in the historical returns of the assets and the current portfolio weights as inputs. The function provides various methods for adjusting weights, including using correlations from all data, a rolling window, pure noise, or equal correlations.

Initially, the function validates the inputs and computes the correlation matrix based on the chosen correlation type. It then uses the diversify.adjust_weights function at each time-step to adjust the portfolio weights by taking into account the correlation matrix, thereby reducing exposure to highly correlated assets.

The motivation behind this function is the critical role of diversification in portfolio management. By minimizing asset correlations, the function helps decrease the overall risk of the portfolio, as uncorrelated assets generally move independently. Therefore, adjusting weights based on correlation can lead to a more balanced and less risky portfolio. This tool is particularly valuable for simulation and research purposes, allowing users to explore various strategies for correlation adjustment.

Create partial function instances with different correlations for adjusting weights

# Create partial function instances of `adjust_weights_corr`
# which bind some of the args for different correlations.

# Use correlations for entire time-series.
func_adj_corr_all = \
    partial(adjust_weights_corr, corr_type='all')

# Use the last 10-days correlation as "naive" forecasting.
func_adj_corr_naive = \
    partial(adjust_weights_corr, corr_type='window',
            corr_window=-10)

# Use the actual future 10-day correlations with heavy noise.
func_adj_corr_heavy_noise = \
    partial(adjust_weights_corr, corr_type='window',
            corr_window=10, corr_noise_scale=0.5)

# Use pure noise for the correlations.
func_adj_corr_pure_noise = \
    partial(adjust_weights_corr, corr_type='pure-noise')

# Use correlations that are all equal.
func_adj_corr_equal = \
    partial(adjust_weights_corr, corr_type='equal', corr_equal=0.1)

# Use the inverted actual future 10-day correlations.
func_adj_corr_invert = \
    partial(adjust_weights_corr, corr_type='window',
            corr_window=10, corr_invert=True)

This code creates multiple partial function instances by fixing some of the arguments of the function adjust_weights_corr for different correlation types. For example, func_adj_corr_all is set to use correlations for the entire time-series by fixing the corr_type argument to ‘all’. Similarly, func_adj_corr_naive will use the last 10 days’ correlation as naive forecasting by setting corr_type to ‘window’ and corr_window to -10.

The function func_adj_corr_heavy_noise applies the actual future 10-day correlations with heavy noise by setting corr_type to ‘window’, corr_window to 10, and corr_noise_scale to 0.5. In another instance, func_adj_corr_pure_noise sets the correlations to pure noise by setting corr_type to ‘pure-noise’. Meanwhile, func_adj_corr_equal uses correlations that are all equal by setting corr_type to ‘equal’ and corr_equal to 0.1. Finally, func_adj_corr_invert inverts the actual future 10-day correlations by specifying corr_type as ‘window’, corr_window as 10, and corr_invert as True.

Using these partial functions allows pre-configured versions of the adjust_weights_corr function to be available with specific correlation types and parameters. This makes it easier to reuse them in different parts of the code without needing to specify the same arguments each time. It provides a convenient way to handle different scenarios, promoting code reusability and maintainability.

Calculate cumulative returns for a portfolio

def portfolio_bhold(returns):
    """
    Calculate the cumulative returns for a Buy&Hold portfolio.
    
    :param returns:
        Pandas DataFrame with asset-returns.
        Rows are for the time-steps. Columns are for the assets.

    :return:
        Pandas Series with cumulative portfolio returns.
    """
    return returns.cumprod().mean(axis=1, skipna=True)

The Python function named portfolio_bhold calculates the cumulative returns for a Buy & Hold portfolio strategy. It takes a Pandas DataFrame called returns as input, where rows represent time steps and columns represent assets. To calculate the cumulative returns, the function first calls the cumprod() method on the input DataFrame, which computes the cumulative product of each column along the rows. The resulting DataFrame contains cumulative returns for each asset over time.

Next, the function uses the mean() method with axis=1 to calculate the average return across all assets at each time step. This means that for each row in the cumulative returns DataFrame, the function computes the average return across all assets. Finally, it returns a Pandas Series that represents the cumulative returns for the portfolio over time.

This code is beneficial for analyzing the performance of a buy and hold strategy for a portfolio of assets over time. It helps in understanding how the portfolio value changes with respect to the initial investment and allows for comparison with alternative strategies.

Calculate cumulative returns for rebalanced portfolio

def portfolio_rebal(returns):
    """
    Calculate the cumulative returns for a Rebalanced portfolio.

    :param returns:
        Pandas DataFrame with asset-returns.
        Rows are for the time-steps. Columns are for the assets.

    :return:
        Pandas Series with cumulative portfolio returns.
    """
    return returns.mean(axis=1, skipna=True).cumprod()

The code defines a function called portfolio_rebal that calculates the cumulative returns for a rebalanced portfolio. This function takes a Pandas DataFrame as its input, where the rows represent time-steps and the columns represent assets along with their respective returns.

To determine the rebalanced portfolio returns, the function first computes the mean return for each time-step by averaging the values across each row, ignoring any missing data. Next, it calculates the cumulative product of these mean returns, which effectively shows the overall growth of the portfolio over time.

This function proves useful for analyzing a portfolio’s performance that undergoes periodic rebalancing and where assets are adjusted to maintain target weights. By examining the cumulative returns, portfolio managers and investors can gain insights into how the portfolio has evolved over time, considering asset allocations and returns. This information is crucial for evaluating the effectiveness of an investment strategy and making informed decisions.

Rebalance portfolio by adjusting weights

def portfolio_rebal_plus(returns, max_weight, func_adjust_corr):
    """
    Calculate the cumulative returns for a Rebalanced portfolio
    where the weights are first set to be equal, but then they
    are adjusted so as to minimize the correlations.

    :param returns:
        Pandas DataFrame with asset-returns.
        Rows are for the time-steps. Columns are for the assets.

    :param max_weight:
        Float with the max asset-weight.

    :param func_adjust_corr:
        Function for adjusting the asset-weights so their
        correlation is minimized.

    :return:
        Pandas Series with cumulative portfolio returns.
    """
    # Initialize the weights to be equal.
    # Copy the Pandas DataFrame to get the same index and columns.
    weights = returns.copy()
    weights[:] = max_weight
    
    # Adjust the asset-weights to minimize their correlation.
    weights = func_adjust_corr(returns=returns, weights=weights)

    # Clip the asset-weights at their upper limit.
    weights = np.minimum(weights, max_weight)

    # Normalize the asset-weights so they sum to max 1.
    weights_norm, cash = normalize_weights(weights=weights, check_result=True)
    
    # Calculate the cumulative portfolio returns.
    port_val = weighted_returns(weights=weights_norm, cash=cash, returns=returns)
    
    return port_val, cash

The Python function portfolio_rebal_plus calculates the cumulative returns for a rebalanced portfolio. It takes three arguments: returns, a Pandas DataFrame containing asset returns over different time steps; max_weight, the maximum weight allowed for any asset in the portfolio; and func_adjust_corr, a function that adjusts the weights to minimize correlations among the assets.

Initially, the function sets equal weights for each asset by copying the returns DataFrame and assigning max_weight to all elements. These weights are then adjusted using func_adjust_corr to minimize the correlations among the assets. Post-adjustment, the weights are clipped at the upper limit specified by max_weight to ensure they do not exceed this value. The weights are then normalized so they sum up to a maximum of 1, considering any residual cash.

The function calculates the cumulative portfolio returns based on these normalized weights and the returns using a weighted_returns function. Finally, it returns a Pandas Series containing the cumulative portfolio returns and the cash component in the portfolio.

In summary, this function is essential for rebalancing a portfolio to optimize returns and manage risk by minimizing asset correlations and ensuring effective diversification.

Calculate portfolio cumulative returns with threshold

def portfolio_thres(returns, future_returns, min_return, max_weight):
    """
    Calculate the cumulative returns for a Threshold portfolio
    that invests equal parts of the portfolio in assets whose
    future returns are above the given threshold.

    :param returns:
        Pandas DataFrame with asset-returns.
        Rows are for the time-steps. Columns are for the assets.

    :param future_returns:
        Pandas DataFrame with future asset-returns.
        Rows are for the time-steps. Columns are for the assets.
    
    :param min_return:
        Float with the threshold compared to `future_returns`
        to determine if the portfolio should invest in an asset.

    :param max_weight:
        Float with the max portfolio-weight for each asset.
        In case the portfolio only invests in a few assets,
        this is used to limit the amount invested in each asset.

    :return:
        Pandas Series with cumulative portfolio returns.
    """

    # Calculate the asset-weights.
    # Only invest in assets with sufficiently high future returns.
    weights = (future_returns > min_return) * max_weight
    
    # Normalize the asset-weights so they sum to max 1.
    weights_norm, cash = normalize_weights(weights=weights, check_result=True)

    # Calculate the cumulative portfolio returns.
    port_val = weighted_returns(weights=weights_norm, cash=cash, returns=returns)
    
    return port_val, cash

The code defines a function called portfolio_thres that calculates the cumulative returns for a portfolio strategy based on a threshold criterion. Initially, it determines the asset weights by evaluating if the future returns for each asset exceed a specified threshold (min_return). It assigns a weight (max_weight) to assets that meet this criterion. Following this, the asset weights are normalized to ensure that their sum does not exceed 1.

Next, the function calculates the cumulative portfolio returns by using the normalized asset weights and accounting for any uninvested cash if the sum of the weights is less than 1. This is achieved by calculating the weighted sum of asset returns to determine the overall portfolio return. Finally, the function returns the cumulative portfolio returns and any uninvested cash as a Pandas Series.

This code is particularly useful for investors or analysts aiming to implement a portfolio strategy that emphasizes assets with future returns above a certain threshold. It aids in diversifying the portfolio by selecting assets based on this specific criterion and allows for tracking the performance of such a strategy over time.

Calculate adaptive portfolio with future returns

def portfolio_adapt(returns, future_returns,
                    min_return, max_return,
                    min_weight, max_weight,
                    func_adjust_corr=None):
    """
    Calculate the cumulative returns for an Adaptive portfolio
    which only invests in the assets that have positive future
    returns, and the asset-weights are proportional to the
    future returns, so more of the portfolio is invested in assets
    that have higher future returns.
    
    The asset-weights are calculated as a simple linear map
    where `min_return` is mapped to `min_weight` and `max_return`
    is mapped to `max_weight`.

    :param returns:
        Pandas DataFrame with asset-returns.
        Rows are for the time-steps. Columns are for the assets.

    :param future_returns:
        Pandas DataFrame with future asset-returns.
        Rows are for the time-steps. Columns are for the assets.
    
    :param min_return:
        Float with the min future return required for an asset.
        This maps to `min_weight`.

    :param max_return:
        Float with the max future return for an asset.
        This maps to `max_weight`.

    :param min_weight:
        Float with the min asset-weight.
        
    :param max_weight:
        Float with the max asset-weight.

    :param func_adjust_corr:
        Function for adjusting the asset-weights so their
        correlation is minimized.

    :return:
        Pandas Series with cumulative portfolio returns.
    """

    # Map the future returns to asset-weights.
    weights = linear_map(x=future_returns, clip='lo',
                         x_lo=min_return, x_hi=max_return,
                         y_lo=min_weight, y_hi=max_weight)
    
    # Adjust the asset-weights to minimize their correlation?
    if func_adjust_corr is not None:
        weights = func_adjust_corr(returns=returns, weights=weights)

    # Clip the asset-weights at their upper limit.
    weights = np.minimum(weights, max_weight)

    # Normalize the asset-weights so they sum to max 1.
    weights_norm, cash = normalize_weights(weights=weights, check_result=True)

    # Calculate the cumulative portfolio returns.
    port_val = weighted_returns(weights=weights_norm, cash=cash, returns=returns)
    
    return port_val, cash

The function portfolio_adapt calculates the cumulative returns for an Adaptive portfolio by leveraging historical returns, future return expectations, and asset weight limits. It begins by mapping future returns to asset weights through a linear mapping based on provided thresholds and weight limits. If a correlation adjustment function is provided, the function adjusts the asset weights to minimize their correlation. The asset weights are then clipped to their upper limit and normalized to ensure they sum up to 1. Ultimately, the function calculates the cumulative portfolio returns based on these weighted returns and cash holdings. This approach allows for the construction of an adaptive portfolio that dynamically adjusts asset weights according to future return expectations, aiming to achieve better performance by investing more in assets with higher expected returns. The optional correlation adjustment feature further enables portfolio customization to align with investor preferences or risk appetite.

Simulate portfolios with different settings

def sim_one_trial(trial, num_stocks, returns, future_returns, 
                  min_return=0.1, max_return=0.5,
                  min_weight=0.0, max_weight=0.1,
                  noise_scale_future_rets=0.0,
                  func_adjust_corr=adjust_weights_corr,
                  use_bhold=True, use_thres=True,
                  use_rebal=True, use_rebal_plus=True,
                  use_adapt=True, use_adapt_plus=True,
                  use_adp_corr_all=False, use_adp_corr_naive=False,
                  use_adp_heavy_noise=False, use_adp_pure_noise=False,
                  use_adp_corr_equal=False, use_adp_corr_invert=False,
                  pullup_window=250):
    """
    Perform one simulation of the different portfolio types
    for a random selection of stock-tickers.
    
    This is not intended to be used directly, but should be
    used through the function `sim_many_trials` instead.
    
    :param trial:
        Integer with the trial number.
    
    :param num_stocks:
        Integer with the number of stocks in the portfolios.

    :param returns:
        Pandas DataFrame with asset-returns.
        Rows are for the time-steps. Columns are for the assets.

    :param future_returns:
        Pandas DataFrame with future asset-returns.
        Rows are for the time-steps. Columns are for the assets.

    :param min_return:
        See functions `portfolio_thres` and `portfolio_adapt`.
    :param max_return:
        See functions `portfolio_thres` and `portfolio_adapt`.
    :param min_weight:
        See functions `portfolio_thres` and `portfolio_adapt`.
    :param max_weight:
        See functions `portfolio_thres` and `portfolio_adapt`.

    :param noise_scale_future_rets:
        Float. If greater than 0.0 then add random normal noise
        to the future stock-returns which are used in the
        Threshold and Adaptive portfolios to determine the
        asset-weights in the portfolio. This is used to test
        robustness when the estimates for future stock-returns
        are noisy. This scale is the std.dev. for the noise.
        
    :param func_adjust_corr:
        Function used to adjust the asset-weights to minimize
        correlation between the assets. By default this function
        is either `adjust_weights_corr` but it can also be a
        derivation of it e.g. to add noise to the correlation
        matrices, which is used to test robustness.
        
    :param use_bhold: Bool whether to simulate Buy&Hold portfolios.
    :param use_rebal: Bool whether to simulate Rebalanced portfolios.
    :param use_rebal_plus:
        Bool whether to simulate Rebalanced+ portfolios which
        adjust the portfolio weights for correlation.
    :param use_thres: Bool whether to simulate Threshold portfolios.
    :param use_adapt: Bool whether to simulate Adaptive portfolios.
    :param use_adapt_plus:
        Bool whether to simulate Adaptive+ portfolios. Note that
        these are somewhat slow to calculate because they adjust
        the portfolio weights for each time-step to minimize
        correlation.
    :param use_adp_corr_all:
        Bool whether to simulate Adaptive+ portfolios using the
        correlations for all the daily returns in the data-set.
    :param use_adp_corr_naive:
        Bool whether to simulate Adaptive+ portfolios using the
        correlations from the previous 10 days as a naive forecast.
    :param use_adp_heavy_noise:
        Bool whether to simulate Adaptive+ portfolios using the
        correlations for the future 10 days with heavy noise.
    :param use_adp_pure_noise:
        Bool whether to simulate Adaptive+ portfolios using
        pure noise for the correlations.
    :param use_adp_corr_equal:
        Bool whether to simulate Adaptive+ portfolios using
        correlations that are all equal.
    :param use_adp_corr_invert:
        Bool whether to simulate Adaptive+ portfolios using the
        correlations for the future 10 days that are inverted.

    :param pullup_window:
        Integer for the window in calculating the Max Pullup statistic.
        
    :return:
        list_stats: List of Pandas DataFrames with statistics.
        list_port_val: List of Pandas Series with portfolio values.
        list_index: List of tuples with index-values.
    """
    # Print status message.
    msg = f'\rnum_stocks: {num_stocks}, trial: {trial}       '
    print(msg, end='\r')

    # List for logging the statistics of the trials.
    list_stats = []

    # List for logging the portfolio values of the trials.
    list_port_val = []

    # List for logging the index values of the trials.
    list_index = []
    
    # Helper-function for logging the results of the trial.
    def log(port_val, cash, port_type):
        # Calculate performance statistics.
        stats = statistics(port_val=port_val, cash=cash,
                           pullup_window=pullup_window)
        
        # Append all the relevant data to the log-lists.
        list_stats.append(stats)
        list_port_val.append(port_val)
        list_index.append((port_type, num_stocks, trial))
        
    # All available stock-tickers.
    all_tickers = returns.columns

    # Because this function is run in parallel, we must initialize
    # a new Random Number Generator for each thread, otherwise the
    # random tickers would be identical for all parallel threads.
    rng = np.random.default_rng()
    
    # Select random tickers.
    tickers = rng.choice(all_tickers, size=num_stocks, replace=False)

    # Get the stock-returns for those tickers, and only use
    # their common periods by dropping rows that have NaN
    # (Not-a-Number) elements. This should only drop rows
    # at the beginning and end of the time-series, unless
    # there are missing data-points for some stocks.
    rets = returns[tickers].dropna(how='any')

    # Get the future long-term stock-returns for the tickers,
    # and only use their common periods by dropping rows
    # that have NaN elements.
    future_rets = future_returns[tickers].dropna(how='any')

    # Only use rows where we have data for both the daily
    # stock-returns and the future long-term stock-returns.
    common_index = rets.index.intersection(future_rets.index)
    rets = rets.loc[common_index]
    future_rets = future_rets.loc[common_index]

    # Add noise to the future long-term stock-returns?
    if noise_scale_future_rets > 0.0:
        # Generate one noise-sample per stock.
        noise = rng.normal(loc=0.0, size=num_stocks,
                           scale=noise_scale_future_rets)
        # Add the same noise-sample to all time-stamps.
        # We could clip it on the low side so the future
        # return cannot be less than zero, but it is not
        # necessary for the Threshold and Adaptive
        # portfolio methods that use the future_rets.
        future_rets = future_rets + noise

    # Buy & Hold portfolio simulation.
    if use_bhold:
        port_val = portfolio_bhold(returns=rets)
        log(port_val=port_val, cash=0.0, port_type=BHOLD, )

    # Rebalanced portfolio simulation.
    if use_rebal:
        port_val = portfolio_rebal(returns=rets)
        log(port_val=port_val, cash=0.0, port_type=REBAL)

    # Rebalanced+ portfolio simulation.
    if use_rebal_plus:
        port_val, cash = \
            portfolio_rebal_plus(returns=rets,
                        max_weight=max_weight,
                        func_adjust_corr=func_adjust_corr)
        log(port_val=port_val, cash=cash, port_type=REBAL_PLUS)

    # Threshold portfolio simulation.
    if use_thres:
        port_val, cash = \
            portfolio_thres(returns=rets,
                            future_returns=future_rets,
                            min_return=min_return,
                            max_weight=max_weight)
        log(port_val=port_val, cash=cash, port_type=THRES)

    # Adaptive portfolio simulation.
    if use_adapt:
        port_val, cash = \
            portfolio_adapt(returns=rets,
                            future_returns=future_rets,
                            min_return=min_return,
                            max_return=max_return,
                            min_weight=min_weight,
                            max_weight=max_weight)
        log(port_val=port_val, cash=cash, port_type=ADAPT)

    # Adaptive+ portfolio simulation.
    if use_adapt_plus:
        port_val, cash = \
            portfolio_adapt(returns=rets,
                            future_returns=future_rets,
                            min_return=min_return,
                            max_return=max_return,
                            min_weight=min_weight,
                            max_weight=max_weight,
                            func_adjust_corr=func_adjust_corr)
        log(port_val=port_val, cash=cash, port_type=ADAPT_PLUS)

    # Adaptive+ with correlation for entire data-period.
    if use_adp_corr_all:
        port_val, cash = \
            portfolio_adapt(returns=rets,
                            future_returns=future_rets,
                            min_return=min_return,
                            max_return=max_return,
                            min_weight=min_weight,
                            max_weight=max_weight,
                            func_adjust_corr=func_adj_corr_all)
        log(port_val=port_val, cash=cash, port_type=ADP_CORR_ALL)

    # Adaptive+ with naive forecasting of the correlation.
    if use_adp_corr_naive:
        port_val, cash = \
            portfolio_adapt(returns=rets,
                            future_returns=future_rets,
                            min_return=min_return,
                            max_return=max_return,
                            min_weight=min_weight,
                            max_weight=max_weight,
                            func_adjust_corr=func_adj_corr_naive)
        log(port_val=port_val, cash=cash, port_type=ADP_CORR_NAIVE)

    # Adaptive+ with Heavy Noise for the correlation.
    if use_adp_heavy_noise:
        port_val, cash = \
            portfolio_adapt(returns=rets,
                            future_returns=future_rets,
                            min_return=min_return,
                            max_return=max_return,
                            min_weight=min_weight,
                            max_weight=max_weight,
                            func_adjust_corr=func_adj_corr_heavy_noise)
        log(port_val=port_val, cash=cash, port_type=ADP_HEAVY_NOISE)

    # Adaptive+ with Pure Noise for the correlation.
    if use_adp_pure_noise:
        port_val, cash = \
            portfolio_adapt(returns=rets,
                            future_returns=future_rets,
                            min_return=min_return,
                            max_return=max_return,
                            min_weight=min_weight,
                            max_weight=max_weight,
                            func_adjust_corr=func_adj_corr_pure_noise)
        log(port_val=port_val, cash=cash, port_type=ADP_PURE_NOISE)

    # Adaptive+ with all correlations being one.
    if use_adp_corr_equal:
        port_val, cash = \
            portfolio_adapt(returns=rets,
                            future_returns=future_rets,
                            min_return=min_return,
                            max_return=max_return,
                            min_weight=min_weight,
                            max_weight=max_weight,
                            func_adjust_corr=func_adj_corr_equal)
        log(port_val=port_val, cash=cash, port_type=ADP_CORR_EQUAL)
        
    # Adaptive+ with all correlations being inverted.
    if use_adp_corr_invert:
        port_val, cash = \
            portfolio_adapt(returns=rets,
                            future_returns=future_rets,
                            min_return=min_return,
                            max_return=max_return,
                            min_weight=min_weight,
                            max_weight=max_weight,
                            func_adjust_corr=func_adj_corr_invert)
        log(port_val=port_val, cash=cash, port_type=ADP_CORR_INVERT)
    
    return list_stats, list_port_val, list_index

The function sim_one_trial is designed to perform a single simulation trial for various portfolio types utilizing given parameters and data. It takes parameters such as the trial number, the number of stocks, historical returns data, future returns data, and several configuration options for different portfolio strategies. Initially, the function randomly selects a subset of stock tickers based on the specified number of stocks. It then filters the historical and future returns data for these selected stock tickers to ensure they have common periods with available data. Subsequently, noise can be optionally added to the future long-term stock returns based on a specified scale to mimic real-world uncertainties.

Once the data preparation is complete, sim_one_trial proceeds to simulate different portfolio types including Buy & Hold, Rebalanced, Threshold, and Adaptive, among others, depending on the provided configuration settings. Throughout the process, the function logs the performance statistics, portfolio values, and index values for each type of portfolio. Finally, it returns the gathered statistics, portfolio values, and index values from the simulation trial.

The modular nature of sim_one_trial allows it to serve as a foundation for higher-level functions such as sim_many_trials, which likely coordinates multiple simulations across different trials and aggregates the results. This modular approach contributes to a more organized and maintainable code structure.

Simulates portfolios for various scenarios

def sim_many_trials(num_stocks, num_trials, **kwargs):
    """
    Perform many simulations of the different portfolio types
    for random selections of stock-tickers in each trial.
    
    This is a wrapper for the function `sim_one_trial` which
    handles the parallel execution and converts the lists of
    results to Pandas DataFrames.
    
    :param num_stocks:
        Either an integer or list of integers with the number
        of random stocks in the portfolios.

    :param num_trials:
        Integer with the number of trials with random portfolios.

    :param kwargs:
        Other keyword-arguments passed to `sim_one_trial`.

    :return:
        df_stats: Pandas DataFrame with statistics for all trials.
        df_port_val: Pandas DataFrame with all the portfolio traces.
    """

    # Ensure num_stocks is a list.
    if isinstance(num_stocks, int):
        list_num_stocks = [num_stocks]
    elif isinstance(num_stocks, list):
        list_num_stocks = num_stocks
    else:
        msg = 'num_stocks should be either int or list'
        raise TypeError(msg)
    
    # Initialize lists for the results.
    list_stats = []
    list_port_val = []
    list_index = []

    # For each choice of num_stocks.
    for num_stocks in list_num_stocks:
        # Ensure the number of portfolio stocks is valid.
        assert num_stocks < len(all_tickers)
        
        # The parallel execution system cannot handle extra args,
        # so we create a new partial function with those args set.
        func = partial(sim_one_trial, num_stocks=num_stocks, **kwargs)

        # Parallel run of the function `sim_one_trial`.
        with mp.Pool(processes=None) as pool:
            results = pool.map(func, range(num_trials))
        
        # Unpack and log the results for later use.
        # The data-structure of the results is quite complicated:
        # It is a list of tuples of lists of data.
        stats, port_val, index = zip(*results)
        list_stats += list(chain(*stats))
        list_port_val += list(chain(*port_val))
        list_index += list(chain(*index))

    # Create index for the Pandas DataFrames from the logs.
    names = [PORT_TYPE, NUM_STOCKS, TRIAL]
    index = pd.MultiIndex.from_tuples(list_index, names=names)

    # Create Pandas DataFrames from the logs of the results.
    df_stats = pd.DataFrame(list_stats, index=index)
    df_port_val = pd.DataFrame(list_port_val, index=index)
    
    # Sort index so the DataFrames are faster to work with.
    df_stats = df_stats.sort_index()
    df_port_val = df_port_val.sort_index()
    
    return df_stats, df_port_val

The code defines a function named sim_many_trials that performs numerous simulations of different portfolio types by randomly selecting stock tickers for each trial. This function accepts three main parameters: num_stocks, which indicates the number of random stocks in the portfolios (either as an integer or a list); num_trials, representing the number of trials to conduct; and kwargs, which encompasses other keyword arguments passed to the function sim_one_trial.

First, the function ensures that num_stocks is formatted as a list and initializes lists to store the results. It then iterates over each portfolio size specified in num_stocks, ensuring that the specified number of portfolio stocks is valid based on external data, namely all_tickers. A partial function, using functools.partial, is created to fix arguments for parallel execution, addressing the limitation that the parallel system cannot handle extra arguments directly.

The core simulation function, sim_one_trial, is executed in parallel using the multiprocessing module, which uses a specified number of processes to manage parallel execution. The results of these trials are then unpacked and stored in separate lists for statistics, portfolio values, and the trial index.

These lists are subsequently combined into Pandas DataFrames, facilitating easier manipulation and analysis. The function ultimately returns two Pandas DataFrames: df_stats, which contains statistics for all trials, and df_port_val, capturing all the portfolio traces.

Overall, this code aims to automate and parallelize the simulation of different portfolio configurations, thereby increasing the efficiency of the simulation process, particularly when handling a large number of trials or portfolio configurations. The resulting DataFrames offer a structured approach to efficiently analyze the simulated portfolio data.

Simulate portfolios and plot statistics

def sim_plot_all(test_name, num_stocks, num_trials,
                 compare_pairs=default_pairs,
                 order=default_order, **kwargs):
    """
    First perform simulations of random portfolios, and then
    plot all the resulting statistics, etc.
    
    :param test_name:
        String with the test-name used in titles and filenames.

    :param num_stocks:
        List of integers with the number of stocks.
    
    :param num_trials:
        Integer with the number of trials with random portfolios.

    :param compare_pairs:
        List with tuple-pairs for portfolio types to compare
        in the plots of the function `plot_compare_traces`.
    
    :param order:
        List with the order of portfolio types to use in the
        box-plots in the function `plot_compare_stats_all`.

    :param kwargs:
        Other keyword-arguments passed to `sim_many_trials`.
    """
    
    # Portfolio simulations.
    df_stats, df_port_val = \
        sim_many_trials(num_stocks=num_stocks,
                        num_trials=num_trials, **kwargs)

    # Plot all comparisons of the portfolio traces.
    for _num_stocks in num_stocks:
        plot_compare_traces(df_port_val=df_port_val,
                            num_stocks=_num_stocks,
                            test_name=test_name,
                            compare_pairs=compare_pairs)
    
    # Compare the portfolio simulations on these statistics.
    list_stat_names = [ARI_MEAN, GEO_MEAN, STD, SHARPE_RATIO,
                       CASH_MEAN, MAX_DRAWDOWN, MAX_PULLUP,
                       MONTHS_LOSS]

    # Plot all comparisons of the performance statistics.
    for stat_name in list_stat_names:
        plot_compare_stats_one(df_stats=df_stats, stat_name=stat_name,
                               test_name=test_name, num_trials=num_trials,
                               order=order)

The sim_plot_all function is designed to simulate random portfolios and plot their resulting statistics. The function takes several input parameters: test_name (a string indicating the name of the test), num_stocks (a list of integers representing the number of stocks in the portfolios), num_trials (the number of trials with random portfolios), compare_pairs (a list of tuple-pairs for comparing different types of portfolios in the plots), order (a list dictating the order of portfolio types), and additional arguments passed through kwargs.

The function first utilizes sim_many_trials to perform the portfolio simulations based on the provided parameters. Once the simulations are complete, it plots comparisons of the portfolio traces for each number of stocks in num_stocks using plot_compare_traces. This enables the visualization of how different portfolios perform over time.

Next, it defines a list of performance statistics to be compared, such as average arithmetic return, geometric mean, standard deviation, Sharpe ratio, mean cash, maximum drawdown, maximum pull-up, and months of loss. It then uses plot_compare_stats_one to plot comparisons of these performance statistics for the simulated portfolios. This aids in analyzing and comparing the performance of different portfolios based on these metrics.

Overall, the sim_plot_all function helps in conducting simulations, analyzing the performance of random portfolios, and comparing different portfolio types. It streamlines the process of running simulations and visualizing the results, thereby facilitating better decision-making in investment strategies.

Generate normal probability density function

def gen_normal_pdf(x, mean, std, name):
    """
    Generate the Probability Density Function (PDF) for a
    normal distribution with the given parameters, as well
    as a label for use in plotting.
    
    :param x: Array of floats for which the PDF should be calculated.
    :param mean: Float with mean of the normal distribution.
    :param std: Float with std.dev. of the normal distribution.
    :param name: String to be prepended the label.
    :return:
        Numpy array with floats for the PDF-values.
        String with a label for use in plotting.
    """
    # Calculate the PDF.
    pdf = norm.pdf(x, loc=mean, scale=std)

    # Probability of loss.
    prob_loss = normal_prob_loss(mean=mean, std=std)

    # Label for use in plotting.
    label = f'{name}: $\mu$ = {mean:.0%} / $\sigma$ = {std:.0%} / Prob. Loss = {prob_loss:.2f}'

    return pdf, label

The Python function gen_normal_pdf generates the Probability Density Function (PDF) for a normal distribution using specified mean and standard deviation parameters and also prepares a corresponding label for plotting. This function takes four parameters: an array of floats x for which the PDF is calculated, the mean of the normal distribution, the std (standard deviation) of the normal distribution, and a name string to prepend to the label.

Within the function, it calculates the PDF values using the norm.pdf function from the scipy.stats module based on the provided mean and standard deviation. Then, it calls another function, normal_prob_loss, to calculate the probability of loss using the given mean and standard deviation. After that, it constructs a label that includes the name, mean, standard deviation, and the computed probability of loss, intended for use in plotting.

The function returns a tuple containing two elements: a Numpy array with the calculated PDF values and a string label for plotting. This encapsulated approach facilitates generating PDF values for normal distributions and creating descriptive labels, easing the visualization and analysis of data distributions in statistical and data analysis tasks.

Plot two normal distributions and compare

def plot_compare_normals(mean1, mean2, std1, std2, names,
                         xlabel='Asset Return', x_min=-0.4, x_max=0.4,
                         filename=None, figsize=figsize_big):
    """
    Plot multiple comparisons of two normal distributions X1 and X2,
    where X1 ~ N(mean1[i], std1[i]^2) and X2 ~ N(mean2[i], std2[i]^2).
    Also show the probability of loss for each, and the probablity
    that X1 < X2.
    
    :param mean1: List of floats for the mean of X1.
    :param std1: List of floats for the std.dev. of X1.
    :param mean2: List of floats for the mean of X2.
    :param std2: List of floats for the std.dev. of X2.
    :param names: List of strings with the names for X1 and X2.
    :param x_min: Float with min value for x-axis.
    :param x_max: Float with max value for x-axis.
    :param xlabel: String with label for the x-axis.
    :param figsize: Tuple with the figure-size.
    :return: Matplotlib Axis object.
    """
    # Number of sub-plots.
    num_plots = len(mean1)
    
    # Create a plot with multiple rows of sub-plots.
    fig, axs = plt.subplots(nrows=num_plots, figsize=figsize)

    # Evenly spaced values for the x-axis.
    x = np.linspace(x_min, x_max, 200)
    
    # Convenience variables.
    _name1 = names[0]
    _name2 = names[1]
    
    # For each sub-plot.
    for ax, _mean1, _mean2, _std1, _std2 in \
        zip(axs, mean1, mean2, std1, std2):
        # Calculate the PDF for the first random variable X1.
        pdf1, label1 = gen_normal_pdf(x=x, mean=_mean1, std=_std1, name=_name1)
        
        # Calculate the PDF for the second random variable X2.
        pdf2, label2 = gen_normal_pdf(x=x, mean=_mean2, std=_std2, name=_name2)

        # Pandas DataFrame with the PDF-values.
        data = {label1: pdf1, label2:pdf2}
        df = pd.DataFrame(data=data, index=x)

        # Plot the PDF-values.
        df.plot(ax=ax)
    
        # Probability that X1 < X2.
        prob_less_than = \
            normal_prob_less_than(mean1=_mean1, std1=_std1,
                                  mean2=_mean2, std2=_std2)

        # Title of this sub-plot.
        title = f'Prob. {_name1} Return < {_name2} Return = {prob_less_than:.2f}'
        ax.set_title(title)
        
        # Vertial line for x=0 splitting gain and loss.
        ax.axvline(x=0.0, ymin=0, ymax=1, color="black")

        # Hide numbers on y-axis.
        ax.yaxis.set_ticklabels([])

        # Show percentage-numbers on x-axis.
        ax.xaxis.set_major_formatter(pct_formatter0)

        # Set label of x-axis.
        ax.set_xlabel(xlabel)

    # Adjust padding.
    fig.tight_layout()

    # Save the figure to disk.
    if filename is not None:
        filename = os.path.join(path_plots, filename)
        fig.savefig(filename, bbox_inches='tight')

    return fig

The function plot_compare_normals is designed to compare two normal distributions by taking parameters such as their means, standard deviations, names, and other optional visualization settings. It generates a plot that visually compares the probability of loss for each distribution and the probability that the first distribution is less than the second. The function proceeds by creating subplots for each set of mean and standard deviation values provided, generating a range of x-values to plot the distributions, and calculating and plotting the probability density functions (PDFs) for both normal distributions.

Additionally, the function calculates the probability that the first distribution is less than the second, displaying this probability in the title of the subplot. It includes a vertical line at x=0 to visually separate gains and losses. To enhance the plot’s readability, it adjusts the appearance by hiding y-axis numbers, showing percentage numbers on the x-axis, and setting the x-axis label.