In discussions about the use of machine learning and deep learning algorithms the issue is often raised that many of these algorithms are black boxes and that we do not know how they work. What do we mean by a black box in relation to machine learning algorithms? What is a black box in itself?
Literally a black box is something that does not emit any light such that we are unable to see what is going on. An initial definition of a black box might be that it is an process or algorithm with observable input and output where the causal mechanism between input and output is unknown. This lack of knowledge implies that we are unable to see and explain what happens within the process or how the algorithm works.
But that is not a formal and satisfying definition, for we can ask: by whom should this causal mechanism be known and what constitutes this knowledge? Is it a not-knowing of an arbitrary individual, or is it a not-knowing of a group of experts of the matter at hand after a sufficient amount of time and resources? And is knowledge expressed in logic, math and natural language, or do we also count something like intuition to knowledge?
And furthermore, in practice there are different levels of knowledge of a process or algorithm. You might be able to explain only a small part of the process or algorithm in isolation, but not the full process from input to output with all interactions between the parts within the process. You might be able to explain how one particular input resulted in an output or how a set of related inputs led to the outputs. And you might be able to explain how changes in the input or conditions of the process or parameters of the algorithm change the output (the causal mechanism behind changes in input and conditions). This all constitutes some form of knowledge about an algorithm or process.
A definition of a black box based on this lack of knowledge experienced by someone is not a good idea. It depends on who is experiencing the black box. And more important, by defining a black box as the absence of something else we have not said what a black box in itself is. So in this way the definition of a black box remains hidden.
Another way to look at it is to see a complex deep learning algorithm as a very complex natural process, like a changing atmosphere, motion of fluids or a neurological process in a human brain. In these cases we observe input and output, but the internal mechanism depends on many factors and is extremely complex. It is not that we do not understand how these processes in general behave; for isolated parts we sometimes know precisely what happens. But because of the size and complexity of these processes and the huge amount of input data that could determine the outcome the causal relations between input and output are simply too complex to comprehend and to express in a succinct form. (I called it an analogy because we do know that a deep learning network is deterministic but for natural processes we do not know that). We often have some knowledge and understanding how certain simple processes in their isolated form behave, but when it comes to any precise prediction many processes are too complex. The same we see in deep learning algorithms; large amounts of input data and several layers with incomprehensible weights.
In light of this analogy it is perhaps better to see a black box algorithm as something that is open for investigation, just like any natural process is. What are the rules of a deep learning algorithm? How can we extract knowledge from a trained neural network? Of course the structure of these algorithm makes it particular hard to extract knowledge but it is not impossible. At least we should not dismiss the problem altogether to call a deep learning algorithm a black box and stop investigating.
And there has been made some progress in this area; some deep learning algorithms emit some light of their internal processes. We can try to generalize an algorithm and look at feature importance, we could use techniques such as LIME, and we could works backwards from the output to the input by back-propagation to learn the feature selection inside the algorithm. But this is just the beginning.
We currently lack a proper terminology to describe the processes in deep neural networks. Terms like interpretability and explainability that have been introduced in the area of deep learning are simply not well defined and too vague to describe what is going on. We need a proper science of neural networks that is able to rationally reconstruct the knowledge that is hidden inside in the weights and other parameters of the deep learning network.
So let’s change the definition of the term black box. Instead of absence of knowledge, basically a form of nothingness, we should see a black box in a more positive sense like nature before we understood (some of) her laws: open to be discovered. In the meantime, what do we do when we lack knowledge of a deep learning process? For me the answer lies in the analogy presented above; we should view the outcome as the outcome of a natural process. What that means is something for another blog.