How Artificial Intelligence Really Learns to be Intelligent | by Timothy Lee

Image generated by OpenAI’s DALL-E — Image Generated by OpenAI’s DALL-E

With terms like Artificial Intelligence (AI) and Machine Learning (ML) dominating news articles and pop-culture alike, more people are interested (and opinionated) than ever before concerning these advent technologies. However, there is a significant gap between AI scientists or machine learning engineers and the general public concerning their perception of what it means for a model to be intelligent. This problem is exacerbated by the fact that resources (articles, videos, etc.) explaining AI and ML are often either too math-heavy or watered-down to the point that they don’t actually teach anything. This leaves many people who want to learn about ML and AI in an undesirable place where they are either turned away by the mathematics or simply get a misleading or false understanding of AI by reading and watching watered-down content. Furthermore, explaining ML and AI without any math at all often involves liberal use of analogies having to do with the human brain and can give people a notion that AI and ML really learn and grow the way people do, leading to optimism or pessimism not necessarily rooted in reality. In this article, I attempt to find a middle-ground between these two general approaches and explain AI and ML in a way that is grounded in reality (high school level math) while abstracting away the ideas that are too math-heavy.

In order to answer the question of how AI can learn to be intelligent, it is important to first distinguish between intelligence and learning in the context of AI. While AI is often used as an umbrella term for many cutting-edge technologies these days, there is actually a difference between artificial intelligence and machine learning. A simple way to think about them is while AI makes decisions, ML enables those decisions to be more and more intelligent over time. Another common misconception is that a model or robot becomes artificially intelligent when ML is introduced. Take a simple vacuum cleaning robot for example. This robot may traverse the floor of a room randomly, vacuuming the floor when dirt is present. This robot, called a simple reflex agent in computer science, does not learn to be more efficient or better over time yet still makes logical decisions (traversing or vacuuming the floor based on dirt presence). Therefore, this robot falls in the category of AI but not ML. On the other hand, a model that can perform object detection (toys, shoes, etc.) is an example of ML but not necessarily AI because this model simply learns to detect objects and classify them without doing anything else with such classification. If we combine the object detection model with this vacuum cleaning robot to allow the robot to avoid certain objects, it would be an example of AI and ML being used in conjunction.

A machine or model learning to be intelligent is typically where people get excited or fearful in the context of AI discussions and demystifying this learning process will be the bulk of this article.

So how does a machine learning model actually learn? The answer is through training.

Consider this rudimentary graph above. We want to find the line (in pink) that best separates the blue circles from the red triangles. The goal is to mathematically plot this pink line over the graph so that we can simply say anything over this line must be a blue circle and anything below will be a red triangle (and be correct).

This pink line may not be great at separating the blue circles and red triangles in the beginning. The training aspect of machine learning refers to the process of looking at many examples of blue circles and red triangles in the graph and slowly improving the pink line to capture the separation between the circles and triangles. So how can we actually improve the pink line to make it more accurate? This is where we use a bit of calculus.

In the beginning, the pink line is likely initialized randomly and not very accurate. Blue circles will be clumped in with red triangles and vice versa. What we want to pay attention to is the numerical degree to which this pink line is “incorrect”. Ideally, we want to do this for every single datapoint in the graph. For every circle and triangle in the graph, we look at the current pink line and establish if it was correctly or incorrectly classified, and by how much.

We can take the degrees of incorrectness for all circles and triangles with respect to the pink line and turn it into a function. This function is often called the loss or cost function.

From this perspective, it can be seen that improving the pink line is akin to minimizing the loss function. In other words, if we can minimize the loss, we can improve the pink line. If you have basic high school level understanding of calculus, you know that finding the minima of a function is simply taking the derivative of the function and setting the derivative to 0. One method often used to minimize loss functions is called gradient descent.

At the cost of omitting many mathematical and logical details, this is essentially how a machine learns. Using calculus to minimize errors and thereby improving this pink line over time with lots of entries of circles and triangles.

After establishing a pretty good pink line, we can now use this trained model to classify brand-new, unknown data. This brand-new data, marked as a question mark in brown, is classified based on where in the graph it lies with respect to the pink line (in this case, it will classify as a red triangle).

As simple as this sounds, this really is the basis for how ML and AI learn to be “intelligent”. This simple process is the foundation that ultimately allows for image recognition, text generation, breast cancer identification, and many more. Examples used in this article are very simple and they are designed to be. In reality, the “pink line” is often not a line at all. Sometimes it will be an entire plane (like a sheet of paper), slicing through 3-dimensional space the way our pink line slices through 2-dimensional graph. Sometimes, it is possible that data to be trained lie in multi-dimensional space beyond 3 dimensions. It is also true that I omitted many mathematical details necessary to actually construct machine learning models such as neural networks. To interested readers, some basic understanding of linear algebra, calculus, and statistics can go a long way in this journey.

In this article, I attempted to explain the process of learning within AI and ML in the simplest possible terms without shying away from mathematical concepts that are absolutely necessary. Hopefully, this article gave some insight into how machine learning models work and gave you tools to discern unreasonably optimistic or fearful opinions about AI and ML that seem to dominate public discourse (i.e. perhaps the Terminator shouldn’t be your primary source of evidence when discussing AI). As with many things, the only thing to fear is math itself.

Source link