Category Archives: Thoughts

What is learning in ‘deep learning’?

Published by:

The general process in an artificial neural network when it is trained is called deep learning. The adjective deep in this case refers to neural networks consisting of several layers with a large number of weights that are calibrated with input and output examples. But what does learning here mean? The question I want to discuss here is whether the learning in a deep learning network is learning in the genuine sense: does a deep learning algorithm learn in the way human beings do? And if a network learns it must learn something, then: what is it that a deep learning network learns?

Suppose you have an artificial neural network with a suitable architecture for image detection and you want to train it to detect cats in pictures. You have a training set of labelled pictures. You can train the network by back propagating through the network and change the weights between the nodes to minimize the output error. During the training process the parameters of each node in the network are changed in a similar way as would be done in fitting a (linear) regression model: the output of an error function is minimized by changing weights, and this is done for each picture in the training set, for each node in the network, and in a number of iterations. If you have done that then, with a sufficiently large training set, the network would be able to detect cats in pictures.

The question is: would it be correct to say that the structure of the network with all its hyper parameters somehow learned the basic properties of cats such that it can detect them in pictures? And to take this one step further, would it be correct to say that the network with trained weights and other parameters somehow knows what cats are?

It depends, of course, on what we understand by learning. If we would define the result of learning as the ability to recognize cats in specific pictures then it would qualify as learning. But if the result of learning is the ability to explain or to know what a cat is then I don’t think what the network does comes anywhere near this definition. A trained neural network calculates a complex function. There is no knowledge in or above the network about what cats are, at least not in a form that we can easily understand. Nowhere during the learning phase there occurs a magical jump from data to knowledge; nowhere becomes the pile of sand a heap. And yet it is able to recognize cats.

What kind of learning it this? Perhaps a better word for the way in which artificial neural networks learn is that they are being conditioned in a certain manner. The basic mode of learning of artificial neural networks, I would say, is conditioning and not learning in a genuine sense (where it would be able to explain what a cat is). In conditioning an association is formed between action (or output) and reward (or desired output); this is similar to the way animals learn.

This mode of learning results is an ability without knowledge. And there has yet to be found a general way to reconstruct knowledge that is tacitly present in the network. If an artificial network learns then it learns in such a way that what is learned remains concealed for us (until somehow we are able to extract from the network the knowledge it learned, but that would be our learning and not the networks learning).

By calling deep learning a form of learning we project a certain idea of learning onto the training process of neural networks. I think the reason for this is the apparent analogy with human brains, and natural neural networks in general, and artificial ones. For this analogy the reasoning is: because a human brain is able to learn and because an artificial neural network mimics the human brain processes, it is only logical to conclude that artificial neural networks are also able to learn. But there is a difference between human learning and the biological process during this learning. I myself am able to learn in the sense that I can explain what I learned (I hope); my brain is conditioned during this learning.

From this perspective a neural network is similar to any non human brain. During a learning phase it is conditioned for an ability. It does not acquire any knowledge during this process about what it has learned.

What is a black box?

Published by:

In discussions about the use of machine learning and deep learning algorithms the issue is often raised that many of these algorithms are black boxes and that we do not know how they work. What do we mean by a black box in relation to machine learning algorithms? What is a black box in itself?

Literally a black box is something that does not emit any light such that we are unable to see what is going on. An initial definition of a black box might be that it is an process or algorithm with observable input and output where the causal mechanism between input and output is unknown. This lack of knowledge implies that we are unable to see and explain what happens within the process or how the algorithm works.

But that is not a formal and satisfying definition, for we can ask: by whom should this causal mechanism be known and what constitutes this knowledge? Is it a not-knowing of an arbitrary individual, or is it a not-knowing of a group of experts of the matter at hand after a sufficient amount of time and resources? And is knowledge expressed in logic, math and natural language, or do we also count something like intuition to knowledge?

And furthermore, in practice there are different levels of knowledge of a process or algorithm. You might be able to explain only a small part of the process or algorithm in isolation, but not the full process from input to output with all interactions between the parts within the process. You might be able to explain how one particular input resulted in an output or how a set of related inputs led to the outputs. And you might be able to explain how changes in the input or conditions of the process or parameters of the algorithm change the output (the causal mechanism behind changes in input and conditions). This all constitutes some form of knowledge about an algorithm or process.

A definition of a black box based on this lack of knowledge experienced by someone is not a good idea. It depends on who is experiencing the black box. And more important, by defining a black box as the absence of something else we have not said what a black box in itself is. So in this way the definition of a black box remains hidden.

Another way to look at it is to see a complex deep learning algorithm as a very complex natural process, like a changing atmosphere, motion of fluids or a neurological process in a human brain. In these cases we observe input and output, but the internal mechanism depends on many factors and is extremely complex. It is not that we do not understand how these processes in general behave; for isolated parts we sometimes know precisely what happens. But because of the size and complexity of these processes and the huge amount of input data that could determine the outcome the causal relations between input and output are simply too complex to comprehend and to express in a succinct form. (I called it an analogy because we do know that a deep learning network is deterministic but for natural processes we do not know that). We often have some knowledge and understanding how certain simple processes in their isolated form behave, but when it comes to any precise prediction many processes are too complex. The same we see in deep learning algorithms; large amounts of input data and several layers with incomprehensible weights.

In light of this analogy it is perhaps better to see a black box algorithm as something that is open for investigation, just like any natural process is. What are the rules of a deep learning algorithm? How can we extract knowledge from a trained neural network? Of course the structure of these algorithm makes it particular hard to extract knowledge but it is not impossible. At least we should not dismiss the problem altogether to call a deep learning algorithm a black box and stop investigating.

And there has been made some progress in this area; some deep learning algorithms emit some light of their internal processes. We can try to generalize an algorithm and look at feature importance, we could use techniques such as LIME, and we could works backwards from the output to the input by back-propagation to learn the feature selection inside the algorithm. But this is just the beginning.

We currently lack a proper terminology to describe the processes in deep neural networks. Terms like interpretability and explainability that have been introduced in the area of deep learning are simply not well defined and too vague to describe what is going on. We need a proper science of neural networks that is able to rationally reconstruct the knowledge that is hidden inside in the weights and other parameters of the deep learning network.

So let’s change the definition of the term black box. Instead of absence of knowledge, basically a form of nothingness, we should see a black box in a more positive sense like nature before we understood (some of) her laws: open to be discovered. In the meantime, what do we do when we lack knowledge of a deep learning process? For me the answer lies in the analogy presented above; we should view the outcome as the outcome of a natural process. What that means is something for another blog.