The recent breakthroughs achieved by ChatGPT and other large language models (LLMs) have ignited a surge of interest in the realm of artificial intelligence, captivating the attention of both experts and enthusiasts. These LLMs, including ChatGPT, have become an integral part of our digital interactions, yet we often overlook the intricate theory that underpins their capabilities.
My own journey into the world of generative AI began with Variational Autoencoders, usually known as VAEs. In this post, and hopefully in subsequent ones, I aspire to embark on a journey of exploration, unraveling the foundational concepts and mathematical constructs that have given rise to these remarkable models. While I will delve into the mathematical underpinnings that render these ideas feasible, I must admit that my approach may not always uphold the meticulous rigor found in academic papers and theses dedicated to these topics. At times, I may even resort to some simplifications in my explanations. I humbly ask for your understanding and forgiveness in advance for any such lapses.
For those seeking a comprehensive and in-depth treatment of this subject, a plethora of resources are available. My guidance primarily stems from Kingma’s thesis and an extended paper co-authored by Kingma, and Welling. Please refer to the reference section for further reading options and a deeper dive into the world of VAEs.
What is a VAE?
Shakir Mohamed, DeepMind’s Director of Research, delves into the fascinating world of generative models in his insightful blog post titled “Generative Models for Climate Informatics.” He explains that, at its essence, generative modeling aims to capture probability distributions. In its simplest form, these models serve as probabilistic representations of observed data. However, these probability distributions are often intricate and challenging to compute directly, leading to a fundamental statistical challenge known as the “Inference problem.”
Throughout the history of statistics, experts have devised various algorithms to address this challenge, such as the Monte Carlo sampling technique. Yet, as the era of big data emerged, these traditional approaches became inadequate, primarily due to the computational burden of per-data-point approximation. To overcome this hurdle, researchers turned to a novel concept: approximating complex…