Traditional neural networks (NNs) have historically been really good at single, isolated tasks. Even though NNs are loosely based on how the human brain works, they fall short compared to the human brain in many areas. One such area is in multitask learning. Though deep learning has recently had breakthroughs in this field, we are just scratching the surface on what NNs are capable of in replicating more human-like behavior.
With our BeyondML project and associated research, we are working on bridging that gap between artificial intelligence (AI) and natural intelligence (NI) by enabling a single NN to perform multiple tasks via isolating specific subnetworks of artificial neurons to perform each task. To see how we achieve this please refer to our blog post about it or our paper on arXiv.
While multitask learning in this way is itself beneficial, knowing which task the model should perform on input data is sometimes nontrivial. What happens when the model should not only perform prediction, but also tell you which task it thinks is the right one for this particular input data?
In this post, we will go over some of the research we have done to answer that exact question. First, though, we will answer an equally important question: why would this kind of automatic task identification even be useful?
So, why would automatic task identification be useful for a multitask NN? Let’s first think about this question in terms of NI. Right now, your brain is processing a ton of sensory information. You’re reading this article, which means your brain is processing visual stimuli from your eyes, identifying the words on the screen, recognizing those words, and processing them to understand their meaning. That itself is a lot of processing, but your brain is doing so much more than that! If you’re listening to music right now or have the television on in the background, your brain is also processing auditory stimulus. Your brain is also receiving stimuli from everything that is touching your body right now, from the clothes you are wearing to even the smallest changes of currents in the air around you.
Now, if I had not brought up all of those other things your brain is processing right now, if someone asked what you were doing, you probably would have responded with “reading an article.” That is task identification. If someone had knocked on your door while you were reading this, however, your brain would have identified a more important (at least in the sense of time constraints) task for you to be doing — seeing who is at the door. In other words, your brain can focus and change tasks almost instantly. Traditional NNs, on the other hand, have a very very hard time doing this.
Now imagine a NN that has been trained on two distinct image processing tasks. For one task, it can identify whether an image contains an object of one set of classes. The other task is identifying whether the image contains an object in a completely different set of classes. Using BeyondML, we can create this network with no problem at all. But what about if someone doesn’t know which task fits best for all inputs? In other words, what if we need the AI to not only identify the class itself, but also to identify which of the two sets of classes it thinks the image falls under? This kind of situation is exactly why we are doing this research.
For the full code that was run for this work, please visit the repository on GitHub at this link: jacobrenn/TaskIdentification: Experiments for automatic task identification using BeyondML (github.com)
To address this problem, we have identified a methodology that allows a multitask NN to assess its own confidence about its predictions and determine, based on that confidence, whether it believes the input data belongs to one task or another. This is based on our hypothesis that a NN trained on multiple tasks across multiple data distributions will be more confident about its predictions for the distribution that input data truly belongs to.
We have tested our methodology in two separate use cases, each with slightly different architectures for performing the kind of processing outlined above. For our first use case, we trained a model to perform both the MNIST hand written digit dataset and the fashion MNIST clothing dataset. The class distribution output for each task is then processed by a feedforward “discerner” model, leaving us with a predicted likelihood for both tasks. To further encourage the model to choose a single task more confidently, we then multiply the predicted likelihood for each task by the output class likelihoods as a single output and add the now-multiplied class distributions together. This leads to the model having three separate outputs: one identifying the predicted class, one identifying the confidence for task 0 (MNIST digit), and one identifying the confidence for task 1 (MNIST fashion). The sum of the last two outputs are always 1 for any single input image. The architecture of the model can be found in the following image, and the code used to load the training data and to create the model can be found below the image.
To test whether this method can be applied to more complex use cases, we utilized a similar architecture to classify images in both the CIFAR-10 and CIFAR-100 datasets. For this experiment, we built a similar architecture for the model, however this time we used the ResNet-50 architecture with the ImageNet weights to generate an embedding of the image. The architecture of this model can be found in the following image, and the code used to load the training data and to create the model can be found below the image.
Clearly, the images present in both of these datasets come from remarkably similar distributions from one another, especially when considering the widely different distributions from the MNIST and Fashion MNIST datasets. If our model is able to differentiate between these two tasks effectively, it would therefore confirm the applicability of the technique in complex use cases.
For the MNIST and Fashion MNIST model, a batch size of 128 was used during training. For the CIFAR10 and CIFAR100 model, a batch size of 256 was used. Early stopping was applied to all model training using a validation split of 20% of the training data. All model results were calculated using test data which was not used during training or validation.
In this section, we present the results of each model’s performance on test data. Naturally, as this experiment includes not only predicting the correct class, but also the correct task, there are multiple facets to identifying performance for each model. In this section, we will present performance metrics of each of our models in ways that answer the following questions:
- How accurately did the model identify the correct task?
- How accurately did the model perform, regardless of task?
- How accurately did the model perform on each task, regardless of whether that task was predicted correctly?
- How accurately did the model perform on each task when the task was predicted correctly?
- How accurately did the model perform when the task was incorrectly predicted?
Naturally, this analysis is much more complex than a traditional analysis, as the multiple outputs of these models add complexity to any prediction the model provides.
Results for MNIST Digit and MNIST Fashion Model
In this section, we present the results for the MNIST Digit and MNIST Fashion model. To identify which classes were predicted for the multiclass cases, we selected the class with the highest predicted likelihood. For the task identification piece, we used a cutoff value of 0.5.
In terms of task identification, this model performed extremely well on test data, achieving 99.94% accuracy on test data. Only 11 images of digits were incorrectly predicted as fashion items, and only one fashion item was incorrectly predicted as a digit. The confusion matrix and ROC curve for this task can be found below.
For the other metrics we collected, we have consolidated the accuracy metrics in the following table. For detailed outputs, we invite the reader to visit the outputs on GitHub here.
Results for CIFAR10 and CIFAR100 Model
In this section, we present the same performance metrics as previously presented for the MNIST Digit and MNIST Fashion model, but this time for the CIFAR10 and CIFAR100 model. Once again, for the class prediction task, we utilized the highest predicted likelihood, and for the task identification task, we utilized a cutoff value of 0.5.
For the task identification task, this model also performed well, achieving 83% accuracy on test data, with slightly better performance on identifying the CIFAR10 task. The confusion matrix and the ROC curve for this task can be found below.
Once again, we also aggregated all additional performance metrics into a single table, shown below. For a more detailed view of the performance metrics, we invite the reader to visit the outputs on GitHub here.
Automatic task detection is a crucial aspect of any automated system, as it allows the system to identify and execute tasks based on specific triggers or input. This can greatly increase the efficiency and accuracy of the system, as it eliminates the need for manual intervention and ensures that tasks are completed correctly.
In this blog post, we created two multitask models using BeyondML with TensorFlow and tested their ability not just to perform multitask learning, but also to identify which task should be considered. While the performances on individual tasks were not state-of-the-art, this was not the goal of this research. Instead, we focused our work on testing a new method for both predicting the data label but also which distribution, or task, the data was taken from.
In this regard, we believe our methods proved extremely useful, as our models were able to achieve over 99% accuracy at discriminating between MNIST Digit and MNIST Fashion images and over 83% accuracy at discriminating between the much more similar CIFAR10 and CIFAR100 images, all while simultaneously performing class prediction within those tasks as well.
Overall, we are pleased with the results of our experiments, and believe that this work is an important step towards creating more advanced AI systems. We hope that our work will inspire further research and development in this area and that our methods will be applied in real-world automation scenarios. In the future, we plan to continue exploring and improving upon our multitask models and aim to achieve even higher levels of performance in automatic task detection.