{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Assignment 4: A Machine Learning Trading Algo\n", "\n", "### Data preparation tasks (15p)\n", "\n", "1. (**Data prep**) Load...

1 answer below »
attached file


{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Assignment 4: A Machine Learning Trading Algo\n", "\n", "### Data preparation tasks (15p)\n", "\n", "1. (**Data prep**) Load *iShares TSX 60* (XIU.TO) order book data from `https://github.com/mariuszoican/mgt443/blob/main/Data/XIU_TAQ.csv?raw=true`. \n", "The file contains 378,376 quote updates for XIU.TO from March 11, 2020.\n", "\n", "2. (**Data prep**) Convert the `Date-Time` column into native `pandas` dates, and add the GMT Offset to obtain local Toronto time. \n", "*Hint*: to add the `GMT Offset` you may apply the inline function `lambda x: dt.timedelta(hours=x)` to the appropriate column.\n", "\n", "3. (**Data prep**) Select only quote updates between 9:30 AM and 3:30 PM (to avoid overnight and opening/closing auction effects).\n", "\n", "4. (**Resample**) Resample the data such that you only retain the last observation for each second. \n", "Forward-fill any missing observations (i.e., with no updates in that second). \n", "*Hint*: you may use `.resample('1S').last()`.\n", "\n", "### Variable building (20p)\n", "\n", "5. (**Midpoint**) Compute the midpoint for each row as \n", "\n", "$$\\text{Midpoint}=\\frac{\\text{Bid Price}+\\text{Ask Price}}{2}.$$ \n", "\n", "6. (**Tick change**) Generate a dummy `Tick` that takes the value:\n", " 1. 1 if the next-second midpoint is higher than the current midpoint;\n", " 2. 0 if the next-second midpoint is lower or equal to the current midpoint;\n", " \n", "7. (**Depth**) Compute the market depth for each row as\n", "\n", "$$\\text{Depth}=\\text{Ask size}+\\text{Bid size}.$$\n", "\n", "8. (**Order imbalance**) Compute the order imbalance for each row as \n", "\n", "$$\\text{Order Imbalance}=\\frac{\\text{Ask size}-\\text{Bid size}}{\\text{Depth}}.$$\n", "\n", "9. (**Bid ask spread**) Compute the quoted bid-ask spread as:\n", "\n", "$$\\text{Bid-ask spread}=\\frac{\\text{Ask price}-\\text{Bid price}}{\\text{Midpoint}}.$$\n", "\n", "10. (**First differences**) Generate columns for the change in depth and order imbalance from the previous second to the current one.\n", "\n", "11. (**Rolling mean**) Generate two columns for the rolling mean of depth and order imbalance, using a rolling window of 10 rows. You can use the `Series.rolling` method. \n", "Further, generate two dummies taking value 1 if the current depth/**absolute** order imbalance is above their rolling mean, and zero else.\n", "\n", "### Logistic regression prediction (40p)\n", "\n", "10. Split the data into equally-sized `train` and `test` samples.\n", "11. Use a logistic regression to predict future midpoint movements using:\n", "\n", " 1. Market depth and its first difference;\n", " 2. Order imbalance and its first difference;\n", " 3. Bid-ask spread;\n", " 4. Whether the depth is above rolling mean or not;\n", " 5. Whether the absolute order imbalance is above rolling mean or not.\n", " \n", " In estimating the logistic regression, use `solver='lbfgs'` rather than `solver='newton-cg'` (better convergence). \n", " You will not be able to estimate the logistic regression with `NaN` values, so you need to delete them. \n", " \n", "12. Plot the ROC curve and assess the predictive power of your model.\n", "\n", "### Testing the algorithm (25p)\n", "\n", "Set a threshold probability $q$ (e.g., the median predicted probability of an uptick) such that you always buy when the predicted probability is larger than $q$, and sell in the next second. That way, you don't accumulate a position.\n", "\n", "1. What is your profit if you buy and sell always at the prevailing mid-point?\n", "2. What is your profit if you buy at the ask price and sell at the bid price?\n", "3. How many trades are profitable in either case?\n", "\n", "Reflect on the importance of transaction costs and the difference between predictive power and strategy implementation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 4 }
Answered 3 days AfterNov 09, 2021

Answer To: { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Assignment 4: A Machine...

Vicky answered on Nov 13 2021
125 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here