I always wonder if its possible to build a model where I could input several bitcoin related features that could predict bitcoin price? This article discuss the intuition behind the LSTM network model and the features/data that I fed into the LSTM model, so no code is shown here. We will get our hands dirty in part 2 instead.
Understanding LSTM
This section adapts from the blog which I think explained LSTM very well, I will just summarize what LSTM is if you’re lazy to go over the entire blog:
To start off, we should know that human don’t start thinking from scratch, for example when you’re reading this you don’t think from scratch every time you read a new sentence. Your thoughts have persistence as you will bring in the information from previous sentence and connect with the information of the next sentence that you are reading.
But the thing is traditional neural network (ANN/CNN) can’t do this, so recurrent neural network (RNN) is being created to overcome this problem, RNN — a loop of neural networks.
Limitations of Recurrent Neural Network:
Remember the main point of RNN is to connect previous information the present task, which is also known as long-term dependencies. To give an example, say you have a sentence “the clouds are in the sky”, predicting the last word wouldn’t be hard for the model, as just by looking at this sentence alone we probably would the last word is “sky”, no further context is needed.
So main point here is that the gap between the relevant information is small, RNN can learn the past information easily
Now take a look at the sentence “I am from France…I speak fluent French”, to predict the last work French, we need the previous sentence to provide us the context “France” from further back. Imagine if the gap between relevant information grows…RNN is then unable to connect the information, and this is where LSTM is then created to overcome this.
LSTM Architecture
Let’s look at the architecture of a general LSTM model from a high level:
The key to a LSTM model is the cell state, which is the horizontal line running on top of the diagram. It has the ability to retain/forget information, regulated by structures called gates (stuffs below the horizontal line). They are a way to optimally let information through and composed out of a sigmoid neural net layer and a pointwise multiplication operation (you can just skip these terminologies lol).
Walking through it step-by-step:
The first gate: Forget gate layer
- Looks at ht-1 and xt (input), and outputs a number between 0 and 1 for each number in the cell state Ct-1. 1 means ‘completely keep this’ while 0 means ‘completely forget this’.
- For example, in the case of time series model (stock/crypto market), the cell state might remember long-term trends in the time series, such as seasonal pattern or overall upward/downward trends. However, if we encounter a sudden change in the time series, such as a one-time event or abrupt shift in the trend, the model will forget the previous long-term trends and focus on the new pattern.
The second gate: input gate layer (two parts)
1. First: a sigmoid layer decides which values we’ll update
2. Second: a tanh layer creates a vector of new candidate values, Ct, that could be added to the state
Continuing our previous example, the first gate would decide to update the seasonal component of the cell state if the time series has a seasonal pattern while keeping the overall trend intact. While the second gate might generate a new vector of positive values if the time series suddenly experiences an upward shift, to capture this trend.
Still part of the second gate: Update the old cell state Ct-1 to new cell state Ct, the previous steps decide what to do, in this step we just need to actually do it.
- Multiply the old state by ft (forget gate output). Then add (it * Ct) [thew new candidate values, scaled by how much we decided to update each state value].
- Continuing our example this would means the LSTM would combine the candidate values generated by tanh layer with the seasonal component of the previous cell state to create the updated seasonal component of the current cell state.
Final gate (output gate): Decides what to output, based on cell state but will be a filtered version
- First run a sigmoid layer which decides which parts of cell state we’re going to output
- Then put the cell state through tanh (to scale the values between -1 and 1) and multiply by the output of the sigmoid gate, so we only output the parts selected by the gate
- Continuing our example, this would mean the LSTM would multiply the scaled cell state by the output of the sigmoid gate, so IF the output gate selected the seasonal component of the cell state, it will output only the scaled seasonal component.
Bitcoin Data
Ok! That’s pretty much everything about LSTM introduction. I know its still quite a technical read but I tried my best to compress it as best as I could. Moving on to bitcoin data collection part: Ops! Did I forget to introduce what’s Bitcoin? I think the best way to give an example on only the basics but otherwise refer to this great video by 3Blue1Brown for an in-depth understanding on how it works.
Imagine you and a friend are living in a world without cash or banks. You want to give your friend some money, but how do you do it?
One way is to use Bitcoin. You create a digital wallet and your friend does the same. You can then send your friend some Bitcoin from your wallet to their wallet using your computer or phone. This transaction is recorded on a public ledger called the blockchain.
The value of Bitcoin fluctuates, so the amount you send may be worth more or less in the future. But once the transaction is confirmed and recorded on the blockchain, your friend can spend the Bitcoin just like cash, or hold onto it as an investment.
That’s the basics of how Bitcoin works!
I pulled live data using CryptoCompare API, and here are the two types of data I collected using this page as reference:
- Historical Price Data
- On-Chain Data (Blockchain)
On-chain data in short refers to the data stored on the blockchain. The data is publicly accessible to anyone with an internet connection (the beauty of bitcoin!) and can be generally classified into 3 categories:
- Transaction Data
- Block Data
- Smart Contract Code
Here are the data I pulled from CryptoCompare API:
· Holder distribution (distribution of coin value among the players in the network, an increase in addresses with disproportionately large number of coins is interpreted as bullish sign, as this is a sign that they are becoming more optimistic)
· Difficulty (high price -> attract more miners to the network (as they are attracted by the high reward/return) -> increase in hash rate -> increase in difficulty)
· Hashrate (Closely related to difficulty)
· Active Addresses (a unique address that has conducted transaction actively on the network over a given period of time, an increase in number of active addresses on a blockchain can indicate increase usage and adoption of the network, hence signalling strength of that coin/network)
· Transaction Count (Another metrics to analyze usage and adoption of a blockchain, refers to number of transactions processed on that network over a given period of time)
· Large Transaction Count (Transactions involving a significant number of cryptocurrencies, or known as ‘whale transactions’, increase in large transactions can impact the market as it involves large amount of cryptocurrency being bought or sold, hence affecting the supply and demand of that cryptocurrency)
That’s everything for this article ! This is a build up/introduction for the LSTM model I’m about to build using these data, which I will write it in a separate article otherwise this article will be super long. Thanks for reading !