Gradient Descent and Cryptocurrency

stay trying.
Towards Data Science
4 min readMay 3, 2018

--

I have been thinking of an interesting way to talk about some of the early things I learned through the fast.ai course. I was on the verge of just running through the concept of stochastic gradient descent (SGD) with restarts with a quick article, but I do enjoy articles that use analogies. Analogies serve as another way to look at a concept and further cement it inside of that brain of yours.

So today, I will provide you all with a nice connection between a twist on SGD and an aspect about the wonderful world of cryptocurrency for those fanatics out there.

Stochastic Gradient Descent with Restarts

There are several great articles on SGD in Towards Data Science, and those who aren’t familiar with it or need a quick recap — I would suggest searching and reading those. I found a couple here and here.

The overall idea is to change the value of the weights in your neural network in order to reach a local minimum. This minimum is defined by a function that you are aiming to optimize. The learning rate is how fast you are moving toward that minimum point. If you learn too quickly, you might miss the minimum. If you learn too slowly, the time to solve the problem may take too long. Data science is all about trade-offs.

In the fast.ai class, we learned early on about SGD with restarts. This technique helps the burden on the model to find the best path in the first go-around. You can think of the ‘restart’ method as a way to look around the search space to find a better path than the first go-around. Here’s a picture to help visualize this.

Fig. 1. From fast.ai Lesson 1

In the left picture, the model is going through the SGD process once. With that one journey, the model has found the best solution to the best of its abilities and constraints. In the right picture, the restart function exploits the SGD by running it multiple times. As you can see, there is more opportunity to reach a better local minimum.

This improvement can be the difference between 99% and 99.5% accuracy in a model! Plus, the course makes this function very accessible, so I recommend checking it out.

Now on to the crypto world.

The Analogy

Some of you make think of this as a stretch as an analogy, but I really enjoy merging ideas together. So, during one of my drives home, I was listening to this podcast about a company called Multicoin Capital. They invest in crypto teams, and they had a lot of interesting things to say about the current state of the space.

One of the topics that was brought up was the framework they use when looking at companies with differing smart contract protocols. The words smart contract protocol doesn’t matter in this discussion, so don’t worry too much about it. They were highlighting their thesis in which they believe there will be multiple winners in the smart contract space because there are several dimensions in which there could be local maximums. They argue that currently, the world has only seen one local maximum which is Ethereum and there are definitely more that will be found.

To break that down a bit, think of the picture above (or scroll up to it). The terrain of that function is very hilly and bumpy — there are deep troughs and there are points of high peaks. In the case of investing, the goal is to find the high peaks (not the local troughs when we think about ML).

So in their world, they are searching and searching through this weird land and trying to find those companies that exploit different features and technologies such that a new local optimum can be realized. These investors are the ‘SGD with Restarts’ trying to jump through the function to find the local maximum (i.e. the companies with the best chance of winning).

SGD with restarts is a different way of looking at how to explore the function space for the lowest error for your ML problem. In the world of investing, people find themselves embarking on something similar — though they are looking for the highest return. It is through this trial and error, where we set ourselves up (and our fun algorithms) for the highest probability of success.

Thanks for reading.

--

--