![](https://crypto4nerd.com/wp-content/uploads/2024/02/0WVmFNmjYnxUmF5Ch.png)
Decoding strategies in NLP refers to the methods used to select the next output token during the generation process. Some common decoding strategies are:
- Greedy decoding: This simply selects the most probable token at each step. i.e, wt = argmax P(wt | w1,w2,…,wt−1). This usually gives low quality output.
- Beam Search: It keeps track of the top ‘beam width’ most probable sequence at each step and selects the overall most probable sequence at the end. This gives better quality than greedy.
- Random Sampling: This random samples from the probability distribution of words at each step. It is like greedy decoding but we sample instead of argmax. This can lead to more diverse outputs but can make the sentence less coherent.
- Tok-k Sampling: This randomly samples from the top-k most probable words at each step. This allows us to control diversity.
- Nucleus Sampling: Also called top-p sampling, instead of sampling from a fixed set of the top k words, this focuses on the smallest possible sets of Top-v words such that the sum of their probability is greater than or equal to a specified threshold, denoted as p. So we sample from a dynamically determined set of tokens and so it kind of dynamically adjusts the diversity. It is slightly more complex to implement than top-k sampling.