LDR: In stock markets, past performance is not always a good predictor of future returns, and this makes predicting stock prices using machine learning difficult. Nonetheless, you can find my attempt here.
Prerequisite: familiarity with Python
Machine learning (ML) can be difficult to comprehend due to the jargon and math involved. In this article, I am going to try to stay as simple and DRY as possible, meaning I will make references to existing resources wherever applicable, rather than rewriting them. I incorporated the questions that I had when I was studying this topic. By the end of the article, I hope you can (1) gain a general understanding of how ML works and (2) build a simple stock trading ML algorithm. Note that it is not our goal to make profit at this time — our first step is to apply ML to predict stock prices. Hope this helps!
Let's dive right in and take a look at the typical structure of a machine learning algorithm. We are assuming that you are using Keras.
Phase 1. What are the inputs?
Data Preparation. For analyzing stock data, incorporating Long Short-Term Memory (LSTM) layers would make the most sense, because it retains the memory of prior time series data, whereas having only Dense layers would allow gaining insight only from its immediate past.
All you need to know for this step is that depending on which model (e.g. Dense, LSTM, etc.) you choose, you might have to reshape the input to 3 dimensions. If we have 2D data, we can reshape it to 3D with:
2D_train_data = np.reshape(2D_train_data, (2D_train_data.shape[0], 2D_train_data.shape[1], 1))
For example, we can represent minute-by-minute stock data in 3D where the 1st axis shows the stock price of different companies, the 2nd axis displays the per-minute stock price, and the 3rd axis represents different days' data.
There is no magic to prepping data. Keras documentation is great and explains the input/output specifications clearly. Scikit-Learn is a useful Python module for standardizing input data, and if necessary, you can use Yahoo Finance or a data provider like Quandl to gather stock price data to train the model.
Once data preparation is complete, we have to answer three questions: (1) what layers are we going to use, (2) how many should we use, and (3) what loss function and optimizer should we use?
Phase 2. The journey of the inputs
This part is where the magic happens!
Layers. This is the core building block of neural networks. You can think of layers as filters that data passes through to become more refined. There are many types: core layers, convolution layers, recurrent layers, and so on. LSTM and GRU fall under recurrent layers; Dense falls under core layers. At each layer you have "weights" which the optimizer updates to minimize the loss.
Question> Do more layers lead to better results? Not necessarily. Implementing more layers can lead to overfitting, which means the model memorizes the training data and does not generalize well to test data — in other words, it would not predict stock prices accurately.
Loss function. Its basic job is to compare the model's output to the expected results and measure the error. Common loss functions include mean squared error (MSE), mean absolute error (MAE), categorical crossentropy, and binary crossentropy.
When trading stocks, being off by $5 is exactly 5 times worse than being off by $1, so MAE makes sense here. While training, the loss starts near random (~50% correct). As the model trains over many "epochs" it learns to minimize the loss — this is where the optimizer comes in.
Optimizer. It uses gradient descent to update weights and biases each epoch, lowering the loss. If you remember calculus, we find a minimum by looking for where the derivative equals zero. Your initial loss might be at a high point on the curve; the optimizer checks the gradient and steps the weights in the direction that reduces loss. The size of each step is the "learning rate". This iteratively lowers training error — but can lead to overfitting if taken too far.
Phase 3. What now?
Evaluation. We can evaluate loss across different epoch counts using the History object, which contains a dictionary of training metrics. Graphing historical loss shows how many epochs minimize overfitting. Overfitting causes poor performance on test data, so finding that sweet spot matters.
At this point, feel free to experiment — tweak the code, try new models, different epochs and batch sizes. It may sound ironic, but this not-so-scientific process is why some experts say machine learning is currently more art than science.
Implementation using Robinhood. For a code example using the Robinhood module, refer to Step by Step: Building an Automated Trading System in Robinhood. Note that you will need a Robinhood account to complete this step.
Conclusion
I hope this article helped solidify your understanding of a typical machine learning algorithm structure using Keras. If you'd like to go deeper, I strongly recommend Deep Learning with Python by François Chollet and Algorithmic financial trading with deep convolutional neural networks: Time series to image conversion approach by Omer Berat Sezer and Ahmet Murat Ozbayoglu.
Read on Medium →