History of a fabulous encounter between artificial intelligence and neuroscience | by Lydia

It is an extremely simplified model of the calculations performed by biological neurons. In fact, in the brain, a neuron can pass information to another neuron if and only if this first neuron activates beyond a certain threshold. For this, he must have previously received a set of excitations whose sum exceeds the threshold in question. Thus, each neuron has a capacity of synthesis (such as the calculation of the sum of the synaptic weights of the neurons of the Perceptron) to produce a positive or neutral response (i.e. without excitation). But the Perceptron has only one layer of neurons. This limits him to solving very simple problems (he is for example unable to recognize handwriting).

Image 4: Representation of a neuron and its myelinated axon with the input data at the level of the dendrites (“inputs”) and its output data at the level of the axonal terminals (“output”). (7)

As the brain has approximately 100 billion neurons, it is quickly understood that Perceptron is to the nervous system what the discovery of fire is to innovation. This big analog computer can currently be summed up in 5 lines of code! This is why artificial neural networks appeared around 1985. They take over the architecture of neural connections by increasing the number of synapses and the number of original layers of the Perceptron. Hidden layers can exist between the input layer and the output layer. This shows a higher level of complexity, which gets close to the functioning of the brain. Therefore, artificial neural networks are a Perceptron with several layers of neurons.

These neural networks determined the connectionist current which is now proving to be very famous. The “Deep learning” also refers to the connectionist current. These marketing words highlight the increased complexity of artificial neural networks compared to the Perceptron (“deep”: approximately 3 layers of neurons minimum). For simplicity, imagine a machine with several buttons capable of lighting a LED. These knobs can be set to different intensities. Each modification of these intensities acts on the response. For example, if you want the machine to turn on the led, you will have to adjust the button settings several times until you find the perfect combination that produces the exact responses (i.e. led illuminations). Artificial neural networks work the same: you have to adapt the weightings of each input to train the machine (see my magnificent diagram — image 5). This is possible thanks to “gradient backpropagation”, a statistical method for calculating the error gradient for each neuron, from one layer to another. Currently, artificial neural networks are particularly used for facial recognition purposes.

You can have fun by training a neural network by drawing here: https://quickdraw.withgoogle.com.

Image 5: Representation of artificial neural networks (right) and the multiple button machine metaphor (left).

The story continues with the creation of convolutional networks (or CNN for “Convolutional Neural Networks”). These artificial neural networks were created by Yann Le Cun during the years of the fabulous “Bell Laboratories”, a historic laboratory where many successes in AI have been recorded. Inspired by the architecture of the brain’s visual cortex, CNN are a type of multi-layered artificial neural networks. They have the particularity of filtering the images by extracting different characteristics. Like the visual system, CNN neurons have receptive fields: they capture only a part of the image and filter this image in order to produce a “smaller” one (easier to process). This step is called convolution. Several filters, or kernels, exist and each of them is specialized in pattern recognition. For example, the first filter can recognize outlines, the second the brightness, and so on. Neurons perform this convolution step for each different filter, producing one new image per filter. The images generated will be processed again by another mathematical operation, called the “pooling” step. This new operation aims to target the pixels with the highest value (To find out more: (explanatory video: 8, convolutional networks demo: 9)). Thus, the convolution and pooling steps continue one after the other until the end of image processing. Unlike traditional artificial neural networks, this technique eliminates the need for joint processing of a multitude of pixels. Therefore, recognizing more complex images — recognizing a dog photo rather than the letter “A” — becomes possible!

CNNs appeared in the late 1980s but were forgotten 10 years later because of the inability to apply the method with the low power of the computers. However, CNNs became famous again from 2012 after the overwhelming victory of a team using this method over another learning system (“System Vector Machine” or “SVM”) in the ImageNet competition. In 2016, the CNNs showed their effectiveness one more time when the AlphaGo machine won the game of Go against a professional player (Lee Sedol was beaten 4–1 against AlphaGo, a machine belonging to DeepMind). Victims of their success, CNNs are now used for image recognition, speech and natural language processing. They have enabled many applications such as machine translation, self-driving cars or medical image analysis systems.

Many learning systems, other than neural networks and CNNs, emerged during this second half of the twentieth century. This is the case of the “symbolic AI” current : a movement which reached its zenith from 1970 to 1980 during a period of disinterest in neural networks. Unlike the connectionist current which starts from perception to create a more complex learning system, the symbolic current tries to mechanize thought processes. This “top / down” approach is known as “expert systems”: systems that translate all thought processes into rules. For example, if we want to mechanize diagnosis, we will begin a rule creation phase with doctors to define decision “protocols” (perform a certain blood test according to the patient or decide on a particular diagnosis depending on the results). Like the human brain, all of these business rules were processed and interpreted by a central system called the inference engine. Despite the success of these methods in the 1970s and 1980s, expert systems experienced a decline. Indeed, reducing everything to a set of rules and tests remains complex and unreliable. These methods are little used today, and the general population keep an annoyed memories of the Windows paperclip.

With all the advances in AI, why is it still impossible to have a learning system as efficient as the human brain? Today, all the dedicated efforts of AI can only reproduce a process performed with less than 1 second in our brain. Indeed, artificial networks mimic the basic visual recognition of our central nervous system, from perception by the eyes to signal propagation in the visual cortex.

The answer to this question is the following: the machine needs data and it is unable to unsupervised learn by observing the world. Indeed, it is necessary to label the data so that the machine learns. However, a baby learns by observing the world without needing the names of all objects. On the other hand, humans have 2 learning systems: a bottum-up system and a top-down system. The first one is commonly used by machines: we learn from the data that surrounds us (sensory data (“bottum”) and process these data into our central nervous system (“up”). Unlike the bottum-up system, the top-down system is an inference system. It is a Bayesian probabilistic system that allows us to make hypotheses about the world. These hypotheses are established according to the experience of each individual. Thanks to these assumptions, our brain becomes predictive: a machine capable of anticipating each situation. If a situation becomes surprising and was not foreseen according to our assumptions, we rectify our system of assumptions by integrating an error. For example, a baby will hypothesize that birds fly after seeing several of them flying. However, if he will see an ostrich, his learning system will warn him and will modify the hypothesis created for birds (ie, to include in his hypothesis that some birds do not fly).

Image 6: Representation of the two human learning systems: the bottom-up system and the top-down system.

In addition, interesting discussions are emerging regarding the contribution of the innate system in learning. During a public debate with Gary Marcus, a psychologist at New York University, and Yann Le Cun, the scientific manager of the AI research laboratory at Facebook, the following question was raised: what dose of innate structure should we put in artificial intelligence systems for emerging intelligence? To answer this question, we can focus on babies’ brains. Is a baby’s brain disorganized? Is a baby born with a blank neural system that must learn from scratch like machines? It turns out not. Babies’ brains have already well-established networks taking the form of brain areas that are specific to different cognitive functions (hearing area, visual area, tactile area, etc.). For example, if a baby listens to his mother tongue, the auditory information will be transcribed into the language network, the same as that of the adult. Therefore, the brain comes with a pre-wired architecture and will be flexible throughout life.

So, from the Bayesian inference system to the innate’s part in intelligence, we are still a long way from intelligent machines capable of learning to learn.

Thank you for reading ! If you liked my post, you can like it ;).

Source link