SPY threshold price calculator based on U.S. Money Stock
TL;DR
Built a regression model to determine how the equity market (SPY as proxy) is trading in relation to the U.S. Money Stock (M2 as proxy). This can be useful for determining a threshold price for the market, above which it may be overvalued and below which it may be undervalued. Input data came from the St. Louis Fed FRED database (monthly M2 data) and Yahoo Finance (historical SPY data), the model was built and analyzed using the sklearn library in Python while the resulting regression line was displayed using a Matplotlib chart.
Personal Objective:
I undertook this project to learn about various ways to preprocess data sets, reduce the increased risk of model overfitting from smaller datasets (in this case just 767 data points for the dependent (SPY) and independent (M2) variables involved) and to effectively work with cross-validation to evaluate model performance.
Problem:
The U.S. The Federal Reserve announced the first round of QE in November 2008 and officially began purchasing a mix of mortgage-backed securities, agency debt and long-term Treasury securities in March 2009. This led to a ramp up in the M2 money supply, which can be observed from monthly (formerly weekly) data provided by the St. Louis Fed FRED database here. A more recent spike in M2 in 2020 came in part from substantial Fed stimulus to help counter the adverse economic effects of the COVID-19 pandemic (in addition to Regulation D implementation that year).
Over a decade of easy monetary policy helped juice U.S. equity markets to all time highs, well out of historical valuation bands. As US markets grew increasingly dependent on quantitative easing since the Great Recession, Fed monetary policy became more scrutinized than ever by market participants as subsequent low rates and Fed balance sheet expansion primed the economy through an expansion of U.S. money supply.
Equity markets are notoriously difficult to predict, with market participants focusing on a wide variety of concrete and abstract variables in an attempt to predict market direction. Algorithmic trading systems based on resulting models have become ever more complex in the pursuit of alpha. There is something to be said for a simple market model that focuses on money supply to help determine if the market is currently trading above or below a valuation threshold in order to inform long term investment decisions.
Solution:
To build and analyze the model I decided to utilize Python and the popular open source sklearn machine learning library due to its robust feature set, relative ease of use and because I wouldn't be working with an exceptionally large dataset. Input data came from the St. Louis Fed FRED database for M2 and Yahoo Finance for SPY. The resulting regression line would be displayed using a Matplotlib chart and a Python script would be used to pull updated M2 and SPY data to determine if current market pricing was running above or below the threshold price determined by the latest available M2 value.
Process:
I started off by putting together a scoping doc in which I outlined project objectives (determine a SPY market threshold for given M2 money supply figure), risks (limited data), milestones (1. Determine variables, pull and preprocess data, 2. Run regression analysis 3. Test model + assess results), assumptions (access to reliable data) and constraints (delayed and increasingly limited Fed data releases).
1 - Prepare Data
I chose SPY to represent the market (dependent variable) as it is the most widely used proxy for the U.S. stock market. I chose M2 to represent money supply (independent variable) as it includes assets that are highly liquid but not intended to be routinely used as cash (as opposed to M1). I pulled monthly M2 data from the St. Louis Fed FRED database going back to November 1980. I pulled SPY equity data going back to January 1993 from Yahoo Finance.
For data preprocessing I aligned the two datasets by taking monthly stock price averages for SPY to account for single trading session outliers and cleaned up data points where necessary. The resulting dataset encompassed 29 years from the start of 1993 to the end of 2022.
2 - Regression Analysis
I used the sklearn library in Python to train a linear regression model using the preprocessed data. I used 75% of the dataset to train and 25% to test the model. I used standard k-fold cross-validation to evaluate model performance (using common 10 folds).
3 - Model Test
I tested different variations of my model and observed the resulting mean squared error and r2 score values. I also benchmarked against various data subsets within the 29 year main dataset and tested using QQQ (Nasdaq) data as an alternate market proxy independent variable with similar results. The r2 score across the various model iterations was between .85 and .93 with the final model at .92.
4 - Model Deployment
I incorporated the completed model into a Python script that pulls the latest SPY price through the TD Ameritrade API and compares it to the predicted SPY threshold price based on the latest monthly M2 value scraped from the FRED database using Selenium.
Outcome:
Although my hypothesis going into this project was that there would be a significant relationship between money supply and equity markets (expected an r2 score of around .7), I was surprised at the resulting strength based on the data I used (actual r2 score of .92). It is important to note that the M2 dataset was relatively small, likely included some errors related to data collection and calculation and that the Fed releases this data on a delayed schedule. However, this outcome does show that money supply could potentially be one useful component of a more comprehensive market model.
Post-Mortem:
If I did this project again, I would:
A) Run the analysis again using weekly M2 data to increase the size of the dataset from 767 to over 3300 data points. This would help decrease the risk of overfitting.
B) Add a time lag component to the analysis to observe if there is a delayed relationship between M2 and SPY at various intervals (i.e. 3 months, 6 months, etc.).
C) Test money supply proxy variables other than M2. For example, the Center for Financial Stability (CFS) reports a more broad-based M4 measure which adds five components to M2 (including institutional money-market funds, long-term deposits, repurchase agreements, commercial paper, and T-bills).
D) Build a multiple linear regression model with additional independent variables other than money supply. Examples could include business outlook surveys (i.e. the Manufacturing Business Outlook Survey released by the Philadelphia Fed) and the Unemployment Insurance Weekly Claims Report released by the Department of Labor. I would then utilize Lasso and Ridge regression models to determine how useful these additional independent variables are and reduce model complexity if needed.