Gamma exposure calculator
TL;DR
Built a calculator that estimates the gamma exposure of market makers using ingested option chain data. This is useful in determining if option market makers are being forced to buy or sell a certain asset in order to maintain neutral footing. Input data came from the TD Ameritrade API (instrument pricing) and the Chicago Board Options Exchange (CBOE) website (option chains) while processed data was outputted in the form of a Matplotlib chart and a CSV file containing underlying data.
Personal Objective:
I undertook this project to learn about what it takes to build a useful app from scratch as it solved an actual problem I had in determining how option market makers are positioned for a given security (i.e. S&P 500). Although my initial objective was to simply output a gamma exposure (GEX) figure, over time I added additional capabilities and features.
Problem:
Gamma exposure calculations exist in somewhat of an esoteric realm given how few traders and investors would know about or, much less, utilize them. As a result they are not (to my knowledge) included as a part of any major brokerage platforms (i.e. Interactive Brokers, Thinkorswim, etc.) Part of this may also be due to how data intensive these calculations can be; an option chain for popular high volume securities can involve thousands of data points.
Solution:
I decided to utilize Python due to its great out-of-the-box data science capabilities and numerous useful built-in libraries. Input data came from the TD API (instrument pricing in JSON format) and the CBOE website (option chains in CSV format) while processed data was outputted in the form of a Matplotlib chart and a CSV file containing the underlying data.
Process:
I started off by putting together a scoping doc in which I outlined project objectives (output GEX figure for given instrument), risks (poor/erroneous data, scope creep), milestones (1 - Pull data, 2 - Calculation 3 - Output to chart/CSV file), assumptions (access to reliable data) and constraints (free data).
1 - Pull Data
I needed to get a hold of live market data along with corresponding option chains. As I didn't want to use paid data (which can be absurdly expensive) I tested Yahoo Finance, Interactive Brokers and TD data APIs. After numerous API reliability issues with the former two, I found the TD API to be vastly superior and linked to it. However, since it had throttling issues when pulling entire option chains I decided to utilize CBOE option chain CSV files instead and retain the TD API for equity pricing data. The CBOE website is notoriously strict about bulk data pulls so I ran Selenium with a headless Chrome WebDriver in order to automate the manual downloading of CSV files containing option chain data for each instrument at a suitable speed that didn't trigger a download ban.
2 - Calculation
As the CBOE option chain data was formatted in a particular way it needed a considerable amount of preprocessing, which I achieved with Pandas DataFrames to locate and strip certain dates, times and option pricing figures. I extracted bid, ask, last sale, implied volatility, delta, gamma, open interest and strike data and placed it into a new DataFrame. From there I was able to calculate updated option Greeks and ultimately GEX itself with the help of the py_vollib library.
3 - Output
I initially outputted corresponding GEX values using a simple print statement to the IDE terminal but later added a Matplotlib graphic along with output CSV files in a designated output directory. Given how time-consuming the calculation process could be, it was important to make it easy to store and reference outputs that didn't need to be constantly recalculated in order to save time.
Outcome:
The final result exceeded my expectations and delivered on my original objective of outputting a gamma exposure figure for any given publicly traded equity instrument with corresponding option instruments available for trading. I added loops for pulling the latest price and option chain for securities contained in a customizable feeder file, organized outputted CSV files in their own directory and even a Streamlit dashboard to display daily outputs from processed data in a browser.
Speed remained a factor, which I improved upon by breaking the program into separate components, some code refactoring and careful data pull throttling on my side to maximize pull speed from the TD API and Selenium for option chain CSV pulls from the CBOE website. More substantial improvements would likely require the employment of professional data feeds due to the limitations of publicly available data sources.
Post-Mortem:
If I did this project again, I would:
A) Focus on data providers with the most reliable data APIs at the start
B) Scope the program to consist of multiple components from the start to cut down on complexity and calculation times and make future updates easier to implement