Just another AI trying to predict the stock market: Part 1

Simeon Kostadinov
Towards Data Science
4 min readFeb 5, 2018

--

Learning how to build Machine learning models is neither a straightforward, nor an easy task. I spent the last month or so learning how neural networks work in details and what makes certain models perform better than others.

Now I decided to put my knowledge into practice and implement a fairly easy example — predicting the stock price of the S&P500 index using a GRU network.

You can use these series as a starting point of your Machine learning journey, so don’t hesitate to dive into without preparation. If any questions occur, just share them with me in the comment section.

#1. Prepare data

The very first thing we should do is get the data for training our model. Yahoo finance is an amazing place to look for fresh new data about any company. So we are going to use the S&P500 dataset for the period between 1/1/1950 and today. Just click on Download data and you are good to go.

For the purpose of this example we are just going to use one python file without an Object oriented pattern or any fancy structures. Let’s name our file sp_rnn_prediction.py and load the data.

The purpose of the libraries is as follows:

  • numpy — used to easily make matrix calculations and mathematic manipulations which are essential for any ML model
  • pandas — used to define a nice data structure for your training data
  • sklearn — a tool used for data analysis (for example normalizing or clustering data)
  • matplotlib — used to display our data
  • tensorflow — Google’s open source library used for building ML graphs in an easy and elegant way

The dataset should be divided into train/validation/test sections where each of them is independent from the others. I chose to split the data into 80% train, 10% validation and 10% test.

Finally, I use pandas to load the .csv file from Yahoo finance and store it into a DataFrame. We will see how to use it later on.

#2. Manipulate data

Our next step is to reform the data in a way so it can be usable for training.

First, we need to normalize it, which basically means to scale each feature to a given range, in our case this will be between 0 and 1. Normalization is achieved as follows:

We use sklearn.preprocessing.MinMaxScaler() and then fit_transform to fit every value in open, high, low and close prices into 0–1 range and transform these matrices to (-1, 1) shape. Reshaping with (-1, 1) means that we want, for example, df['open'].values to become a matrix of shape (k, 1) where k is unknown and is determined by numpy. In this case k will be equal to the number of different prices.

Then, we need to separate the data into training, validation and test. As mentioned above, 80% will be the training, 10% — validation and 10% — test.

There are 3 main steps in the above snippet:

  1. Split the data into different arrays with the same length seq_len.
  2. Determinate the length of the train/validation and test data based on the number of items.
  3. Divide the data in the right proportions.

#3. Display data

Finally we need to display the normalized data. This isn’t essential for the model performance but is extremely useful when it comes to debugging your code. Thus, one should make it a habit to always visualize the dataset.

Before plotting the data we need to use the methods defined above.

First, we make a copy of the DataFrame and remove the unused parameters (‘Volume’ in this case) and then normalize the other values. After that, we just need to plot it using the matplotlib library:

  • Line 14: initialize the image frame.
  • Lines 15–18: plot the different values.
  • Lines 19–23: add signs and show the figure.
S&P500 price visualization (normalized)

In the next part

We are ending the first part with the above visualization. In Part 2, we will focus on training the model using the prices. We will heavily make use of TensorFlow so you can see how this excellent library works in practice.

Thank you for the reading. If you enjoyed the article, give it some claps 👏 . Hope you have a great day!

--

--

Obsessed with creating a positive impact. Love blogging about AI and reading books. For more content, follow me 👉 https://www.linkedin.com/in/simeonkostadinov/