![](https://crypto4nerd.com/wp-content/uploads/2024/04/0U-ML_JYIyPjbTVj4-1024x463.jpeg)
Explain Transformers, the architecture behind Google Translate and ChatGPT, using everyday language in 100 seconds.
Imagine you’re having a conversation, and each word you hear helps you figure out what’s coming next. That’s a bit like how self-attention works in AI; check out my previous post to see how it works. It’s a smart way to pay more attention to important words to better predict the next ones. But the real magic starts with Transformer Models that turn this concept into a full-fledged word generation engine.
Here’s the catch: Computers don’t get words. They see everything as zeros and ones. So, when you feed words to AI models like ChatGPT, each word gets transformed into a word embedding. Think of it as turning words into a secret numerical code, where similar words have similar codes. “Sad” and “unhappy” are close in meaning, so they’re close in this numerical space too. Check out @akshay_pachaar for his awesome illustration on embeddings.
But here’s the kicker: the order of words matters too. “Dog bites man” is a whole different story from “Man bites dog”. So, a positional embedding is also given to each word, telling the model the word’s place in line. Instead of using simple counts like 1, 2, and 3 for positioning, which can get unwieldy in long texts, the Transformer uses trig functions to keep word positions in a neat -1 to 1 range.
Next, the Transformer dives into the meaning and order of words using multi-head attention. Basically, it is multiple self-attention running in parallel, capturing different word relationships within the context, be it referential or emotional. See my earlier post for a primer for self-attention.
With the meaning, position, and relationships of words in hand, the Transformer understands the essence of context. Hence, it ponders over this data with a feed-forward network — a type of neural network. This part of the AI brain mulls over the insights gathered and picks the best next word, like choosing “Musk” to follow “Tesla’s CEO Elon…”.
Combining multi-head attention with this feed-forward network forms the core of a Transformer Head. Stack up enough of these, and you’ve got yourself a language model like ChatGPT that can chat, translate, or write stories. The graph below indicates the simplified pipeline of a Transformer Model.
This is a glimpse into the Transformer technology, a cornerstone of today’s AI wonders, inspired by the groundbreaking paper Attention is All You Need. Kudos to the authors for this tech marvel.
If you are interested in simplified and intuitive AI explanations, follow me to bring this tech marvel closer to everyone. #AI4ALL #TechForGood