Optimizing Financial Strategies: Harnessing Machine Learning for Enhanced Trading Performance
Leveraging Alpaca API and Advanced Analytics to Navigate Market Volatility and Maximize Returns
In this exploration of cutting-edge trading technology, we introduce a sophisticated trading bot, engineered to leverage a decade of financial data via the Alpaca API. The bot’s foundation is rooted in machine learning, utilizing the alpaca.getbars() function for data access and focusing on a moving average crossover strategy. This strategy, pivotal to its operation, hinges on the interaction between the 2-day and 200-day Simple Moving Averages (SMAs), a technique aimed at capturing market trends and volatilities.
The setup involves critical libraries like Pandas for data processing, Matplotlib for visualization, and SKLearn for machine learning model implementation. The article outlines the configuration of Alpaca API keys, data retrieval, preprocessing, and the application of machine learning models, including Support Vector Machines (SVM) and Logistic Regression. It delves into model training, testing on historical data, and evaluation using classification reports and return analyses, emphasizing the significance of feature scaling and selection. The culmination of this technical journey is the analysis of trade signals and the financial efficacy of the strategy, measured by profit/loss and ROI metrics, presenting a nuanced blend of algorithmic trading and machine learning.
Download the source code from the link in comment section
This bot is a sophisticated algorithm that utilizes 10 years of financial data obtained from the Alpaca API. It employs the alpaca.getbars() function, which can access up to 1000 trading days of data.
For training, the bot uses a one-year period. This period is divided such that 75% covers the time leading up to the pandemic-induced market crash, and the remaining 25% includes the crash period and the initial phase of the market recovery.
The trading strategy of the bot is based on moving average crossovers. It executes trades when the 2-day Simple Moving Average (SMA) intersects with the 200-day SMA.
# Import the required libraries and dependencies
import os
import requests
import pandas as pd
from dotenv import load_dotenv
import alpaca_trade_api as tradeapi
%matplotlib inline
from alpaca_trade_api.rest import TimeFrame
import numpy as np
from pathlib import Path
import hvplot.pandas
import matplotlib.pyplot as plt
from sklearn import svm
from sklearn.preprocessing import StandardScaler
from pandas.tseries.offsets import DateOffset
from sklearn.metrics import classification_report
This code imports all the necessary libraries and dependencies, such as os, requests, pandas, dotenv, alpaca_trade_api, matplotlib, numpy, standard scaler, pandas.tseries to support the code that follows. Then, the matplotlib library is set to display plots inline. It also imports TimeFrame from alpaca_trade_api.rest. The code also imports the svm function from sklearn for support vector machine and the classification_report function for evaluating the performance of a given classifier. The code also sets a DateOffset from pandas.tseries to use for time-series data. Additionally, the standard scaler function from the sklearn library is imported for standardization of datasets. Finally, the code sets the Path library from pathlib to provide file and directory path handling functionality.
Step 1: In the root directory of the `Unsolved` folder, generate a `.env` file. This file is designated for storing your Alpaca API keys and secret keys.
In Step 2, you will need to integrate the Alpaca API and secret keys into the decisive_probability_distributions.ipynb
file. Start by assigning the values of these keys to variables with corresponding names. To achieve this, begin with invoking the load_dotenv()
function to load the environment variable. Then, assign the values of the environment variables to alpaca_api_key
and alpaca_secret_key
. Finally, ensure that these variables are correctly set up and accessible by verifying the type
of each variable.
# Set Alpaca API key and secret by calling the os.getenv function and referencing the environment variable names
# Set each environment variable to a notebook variable of the same name
alpaca_api_key = os.getenv("ALPACA_API_KEY")
alpaca_secret_key = os.getenv("ALPACA_SECRET_KEY")
# Check the values were imported correctly by evaluating the type of each
type(alpaca_api_key)
type(alpaca_secret_key)
This python code sets the Alpaca API key and secret by calling the os.getenv function and referencing the environment variable names. It then sets each environment variable to a notebook variable of the same name, making it easier to use and access in the code. After setting the variables, the code checks to ensure that the values were imported correctly by evaluating the type of each variable and printing the result. This is important because it ensures that the correct data type was used and that the values were successfully accessed from the environment variables. This code is useful when working with sensitive information, such as API keys and secret keys, as it allows the user to securely and easily access and use them in their code.
For Step 3, you will establish the Alpaca API REST object. This is done by utilizing the Alpaca tradeapi.REST function. During this process, you will need to configure the function by setting the parameters alpaca_api_key, alpaca_secret_key, and api_version. This step is essential for initializing the REST object with the correct credentials and settings.
# Create your Alpaca API REST object by calling Alpaca's tradeapi.REST function
# Set the parameters to your alpaca_api_key, alpaca_secret_key and api_version="v2"
alpaca = tradeapi.REST(
alpaca_api_key,
alpaca_secret_key,
api_version="v2")
This code creates an API object using Alpacas tradeapi.REST function. It then sets the parameters for the alpaca_api_key, alpaca_secret_key, and api_version=v2, allowing the API object to access data and perform trading actions on behalf of the user. This API object will be used to make requests to Alpacas trade API, allowing the user to manage their account and make trades using the Alpaca platform.
In Step 4, you will leverage the Alpaca SDK to perform an API call that retrieves a year’s worth of daily stock data, spanning from May 1, 2019, to May 1, 2020, for selected stock tickers. Begin by defining the required tickers. Next, determine the start_date and end_date by using the pd.Timestamp function, ensuring these dates are set from May 1, 2019, to May 1, 2020. You should then specify the timeframe value as 1 day. Finally, create the portfolio_prices_df DataFrame. This DataFrame should be established by assigning it to the result of the alpaca.get_barset function, with the previously set parameters.
# Create the list for the required tickers
tickers = ["SPY"]
The code creates an empty list named tickers and assigns one element, SPY to the list. This signifies that the user is interested in information related to SPY, most likely a stock or financial instrument. This list could be used later on in the code to store information or perform calculations related to the SPY ticker.
# Set the values for start_date and end_date using the pd.Timestamp function
# The start and end data should be 2019-05-01 to 2020-05-01
# Set the parameter tz to "America/New_York",
# Set this all to the ISO format by calling the isoformat function
start_date = pd.Timestamp("2012-04-12", tz="America/New_York").isoformat()
end_date = pd.Timestamp("2022-04-12", tz="America/New_York").isoformat()
This code sets the start and end dates to be used in a time series analysis. It does this by calling the pd.Timestamp function and passing in specific dates 2019–05–01 and 2020–05–01. The tz parameter is set to America/New_York which specifies the time zone for these dates. Finally, the isoformat function is called to convert these dates into the ISO format which is commonly used for date and time representations. By setting the start and end dates in this way, the code ensures that any time series analysis performed will use the correct time zone and format for accurate results.
# Use the Alpaca get_barset function to gather the price information for each ticker
# Include the function parameters: tickers, timeframe, start, end, and limit
# Be sure to call the df property to ensure that the returned information is set as a DataFrame
prices_df = alpaca.get_bars(
tickers,
TimeFrame.Day,
start=start_date,
end=end_date
).df.iloc[:1000]
#api.get_bars("AAPL", TimeFrame.Hour, "2021-06-01", "2021-06-01").df.iloc[:10]
# Review the first five rows of the resulting DataFrame
prices_df.head()
This code uses the Alpaca get_barset function to retrieve price information for a list of tickers within a specified time frame. The function requires parameters such as the list of tickers, timeframe, start and end dates, and limit for the number of data points to be retrieved. The resulting data is then stored as a dataframe using the df property. The code further uses the .iloc method to select the first 1000 rows of the dataframe. The commented out code below shows an example of how the get_barset function is called with specific parameters. Finally, the .head method is used to review the first five rows of the resulting dataframe.