Bayesian Neurology: exploring the link Between Machine Learning and Brain Functions. | by Martin W. Hansen

These are some of my favourite research papers, and they inspired me to give it my all and learn a lot of new mathematics and physics. Natural sciences have always been something of interest, even so physics. (All sources are hyperlinked).

“If the human brain were so simple, that we could understand it, we would be so simple that we couldn’t.” — Emerson M. Pugh –

What is Bayesian Neurology and How Does It Relate to Machine Learning?

The research on bayesian inference in brain research is above all truly fascinating, as it provides a peek into how our brains process information and make decisions. By understanding the processes of Bayesian inference, we are able to glance into how our brains learn and adapt to changing environments. This research can also provide valuable insights into how we can improve decision-making, learning, and problem-solving in both artificial intelligence and a general understanding of the human cognition.

First, we have to go back to organisms and to the understanding of evolutionary objectives. What is the point of having a brain? And why is information important?

“Information as originally defined by Shannon is a reduction of uncertainty. Selection means the elimination of a number of possible variants or options, and therefore a reduction in uncertainty. Natural selection therefore by definition creates information: the selected system now “knows” which of its variants are fit, and which unfit”. Gershenson & Heylighen 2003.

In other words, there is an evolutionary reward of creating and managing information. In one of my favourite research papers, H. Shimazaki demonstrates even the simplest thing such as how our retina reacts to different light intensities, and how this process requires a non-linear response. Somehow, the primary visual cortex “knows” and responds in a way that requires more complex inference than what a linear response can provide. The reason why the retina and primary visual cortex knows this can be understood as an adaptation. And also, we cannot consciously decide to adjust our retinae to such and such thresholds according to different light intensities. Instead, we see it as an unconscious inference (more on that later).

Machine learning frameworks can be applied to investigate and research the biological brains in a variety of ways. For example, machine learning algorithms can be used to analyse large datasets of brain activity. By analying these scans, we can gain insights into how different regions of the brain interact and how they are connected to each other. Additionally, machine learning models can be used to identify patterns in brain activity.

And here is where the Bayesian inference aspect comes in. Bayesian inference in relation to brain models can be illustrated as follows with an example of two events, A, and B.

Posterior: The probability of event A, given B. Prior: what you know beforehand about event B. Likelihood: the probability of event B given event A, which you have experienced before.

To further develop this Bayesian inference framework, we have to divide its mechanisms into smaller parametres by using Shimazaki’s concept.

x = neural population

w = brain structure

y = external stimuli

Put into simpler terms: x is how the neurons behave, indicating the property of the neural population’s activity. W is the very structure of the biological brain itself holds information. Y is the external world, independent from our perceptions. What we can interpret is a sample of external stimuli. The letter p indicates the probability of a parameter being true.

And it must here be noted that the brain can only interpret a single sample of stimuli from the “true” external world. These two are not the same!

What the brain dynamics tells us is that the difference between the external world and the joint probability of p (y|w) will converge and become closer to each other. In other words, the inner workings of the brain will more and more resemble the external world. The model also treat x as an vector form with the populations of x-ith of neurons.

We start with the prior and the likelihood and conclude with the posterior. *Mathias Harrer et al.*

In order to compare the distributions of the probabilities of what you are experiencing now (stimulus Y and likelihood) compared to what your brain has experienced before (brain structure and prior W), we apply the calculation of Kullback-Leibler divergence as a mean of measuring the statistical distance between these two distributions (here noted as Y and W instead of the traditional P and Q).

Now, the parameter of the brain W will seek to optimise its marginal likelihood. Logarithmic is used because it makes computing small numbers easier, and marginal because the different likelihoods should be independent from each other.

By the use of an Arg Max function similar to machine learning, W will optimise and choose the distributions with the highest likelihood from an array or matrix of other likelihoods, thus noted W*.

The Argmax function can find the highest value in a matrix

The model for learning: W* = arg max log p (Y1:n| w)

By this, Shimazaki demonstrates a generative model in the form of a joint probability function which describes the connection between neural activity and stimuli.

p (y|x,w) is noted as the observation model.

p (x|w) is referred to as spontaneous activity: there is no stimuli (y) presents.

p (y|w) is how the neurons behave during stimuli and is here noted as sensory stimulus model.

p (y,x|w) = p (y|x,w) p (x|w)

This is calculated as:
The probability of how the stimuli and the neural activity are, given the brain structure with previous information is:

The probability of the stimuli being true given the neural activity multiplied with brain structure, multiplied with the probability of the neural activity given brain structure. Quite the mouthful!

Shimazaki illustrates this further as the concept of the generative model. It generates models of the outside world (Y).

Hence,

Generative model = neural activity in observation model × spontaneous activity. It compares what how the brain behaves during stimuli to how it behaves without stimuli.

The probability of the neural activity which is initiated by stimulus is considered as a sample of the posterior distribution. This means that we will have a joint probability density of neural activity x given observation Y.

We can demonstrate this by putting in some easy numbers.

y = 0,35. We have a large degree of uncertainty about the stimulus from the outside world.

x = 0,60. We have some degree of certainty about how the neurons behave

w = 0,80. We have a high degree of certainty about our prior knowledge.

Note: we had greater uncertainty about the outside world (y) stimuli but ended up with greater certainty on the posterior distribution.
It must be added that the calculation of probability is more precise as a distribution function rather than a single number representing the percentage of probability. A distribution function provides more data through a bell-shape (width or dispersion) mean (location parameter) and standard deviation (scale parameter) (Kruschke, John).

In addition, Shimazaki points out that a perfect inference would probably be very unlikely, so the posterior distribution for the stimulus-evoked neural activity is expressed as an approximation as q (x|y) ≈ p (x|Y,w).

This approximation is called the recognition model — the process of regognition happens when the stimulus (Y) is be combined with the structure of the brain which holds prior knowledge (x).

The researcher also presents another fascinating concept: the thermodynamic mechanism of the brain. How does the process of learning behave in itself? How does the changes of states inside the brain manifest themselves through neural spike dynamics?

The dynamics of neural activity is expressed in such a way by the thermodynamic law of conservation through the state of spontaneous activity. And by the second law which states that entropy increases, a manifested as a process of learning. These dynamics are then employed by modulating the gain of interplay between feedforward from more primitive parts of the brain, and feedback from higher cortical areas. As this back-and-forth communication requires time-delay, it can be measured and detected.

“Similarly to the gain control in engineering systems, neural systems can realize the gain control by either feed- forward or feedback connections”

“…We show that the delayed gain control of the stimulus response via recurrent feedback connections is modelled as a dynamic process of the Bayesian inference that combines the observation and top-down prior with time-delay” H.Shimazaki.

In order to retain and generate knowledge through new stimuli, there must be an energy requirement involved.

Because the recognition model q (x|Y) would require energy. It must be activated, and that means the brain must change its initial state of spontaneous activity p (x| w) where prior information is stored. The same holds true for observation models p (Y|x,w): This process also implies a change of state and therefore requires energy. On the other hand, the First Law of thermodynamics formulates the conservation of energy, where the total energy in a closed system can neither be created nor destroyed. By that, factors like how the refractory period limits the number of action potentials that a given nerve cell can produce per unit time, the limits of neural configurations, and the metabolic “costs” of firing rates themselves will put a constraint on the neural activity.

Shimazaki states the stimulus-response neural activity where the entropy is as follows:

The neural population of x given stimuli Y will be the logarithmic function of itself, which entails a monotonic increase.

The constraints are considered as weighted biases of neuron firing activity rate α and gain control β between feedback and feedforward. The recognition model q (x|Y) will then be constrained by the log minus of the prior and the log minus of the observation model. This is solved by the author by using Lagrange multiplier in order to find a minima of free energy.

Entropy with constraints α and β.

Which then is calculated to be a probability density function where the free energy is as close to zero as possible. The approximated recognition model q (x|Y) will under the constrains be.

While the second law of thermodynamic states the entropy in closed systems always increases, it can be seen as how neural activity under the constraints of controlling activity trajectories and firing rate, will assemble itself into a neural state of creating and storing information. It has been demonstrated that the development of information (hence priors) are evident in neural activity in animals. The spontaneous activity will optimise by becoming more similar to stimuli-evoked activity as the animal grows.

Some other similar papers demonstrate how cue combination, Rescorla can create new likelihoods and priors. When different stimuli happen simultaneously such as smell, vision, sensations, and coinciding events, it builds up into new probability distributions.

We can also be applied this is in decision theory, both as biological and machine intelligence.

In this way, we can apply the same Bayesian framework to decision-making. We replace the neural activity probabilities with hypothesis as a prior and data as evidence. Biases are calculated as weights α and β, changing the decision outcome.

log (P (H |D)) = β log P (D |H ) + α log P (H) + const.

Are we forgetful? Then the weighted bias α of the prior is weaker.

Are we forgetful? Then the weighted bias α of the prior is stronger.

Are we being too flexible? Then the weighted bias β of the likelihood is stronger.

Are we being too rigid? Then the weighted bias β of the likelihood is weaker.

In the process of calibrating computer algorithms, the same types of questions could be used.

Bayesian inference can despite it seemingly complex nature also be used as a mental tool, for everyday relatable events in order to make more informed decisions.

Summary of Shimazaki’s probabilities.

p (x|w) = introspection, thinking: Spontaneous brain activity separated from stimuli.

p (y, x|w) = the probability of p that you are thinking something (x) combined with what you are experiencing (y) given what you have previously learned (w) (brain structure).

p (y|w) = sensory stimulus = stimulus, and neurons activated and occupied with sensory input (not structural). = What your sensory brain is doing when experiencing stimuli.

p (y|x,w) = observation model, where active neural populations (x) are combined and inferenced with brain faculties/structures and established probability distributions (w).

Source link