Carlo Ginzburg explores, in a magnificent essay entitled “Clues: Morelli, Freud and Sherlock Holmes” featured in The Sign of Three: Dupin, Holmes, Peirce (ed. Eco, U and Sebeok, T, 1988), a series of ideas that not only touch deeply on the nature of semiotics and the divide between natural science and social as well as human science, but also on any number of very interesting threads that interest me deeply. In a series of posts, I will explore and discuss some of those threads – but for anyone interested in engaging with a fascinating piece of writing I can only recommend reading the essay and working through its rich ideas and complex structure carefully.
The focus of Ginzburg’s essay is the art historian Morelli, who, under a pseudonym published a work intended to help attribute paintings more accurately. Morelli’s method was simple, but a stroke of genius. Instead of focusing on the attributes that everyone associated with a master – i.e. the smile of Mona Lisa and others in Leonardo’s paintings – Morelli suggest that the attribution of authorship might actually be done more precisely and exactly by looking at details in the paintings that the masters were not associated with.
Ginzburg reprints and shows a series of collections of ears, fingernails and noses that Morelli studied carefully and used to re-attribute a large number of works of art in an often spectacular way. Morelli’s method – focusing on the involuntary details and clues provided by the author of a work signalled a shift in thinking, a focus on symptoms, that Ginzburg attributes in part to Morelli being a doctor, used to collecting symptoms and using them to diagnose an underlying but directly inaccessible phenomenon.
The same method, of course, is also used in detective stories, and as Ginzburg shows – Sherlock Holmes himself actually in one story uses the uniqueness of an ear to establish that the unknown victim of a crime, whose ear is presented to Sherlock, is in fact a blood relative of one of the people he encounters in a case. But back to the method. Morelli focuses on details and patterns not often paid attention to, to draw conclusions about complex artefacts. There is a methodological similarity between this and what we can now do in data science that is interesting. Let’s think about it.
For any problem, we can state that it all the data pertaining to the problem can be divided into two different sets. One set is the set of data we usually ask for when we try to solve a problem or analyze it. This set, let’s call it the Canon set of data, exists in a vast space of details and data that we usually do not associate with the problem. The canon set is arrived at through experience, theory and exploration of the problem at hand. We decide that the data in the canon set is important because it has proven to give us at least a certain percentage of success in understanding, and perhaps solving the problem. Morelli’s suggestion, then, is that there are many problems where there is something even more efficient in the Morellian space outside of the canon set. The problem in Morelli’s day was that to explore Morellian space you had to be very clever and think outside of the canon set, consciously. You had to focus on the search for data in Morellian space that could outperform the canon set of data used to solve the problem. With new pattern recognition technologies we may actually be able to search more extensively through Morellian space and identify data or clusters of data that allow us to solve problems more effectively.
Such a search would probably look something like this: first we identify the accessible data we have in a particular case, then we we try to understand what the canon set looks like in order to establish a reference case, after which we try different combinations of data in Morellian space to see what could possibly outperform the canonical set of data used for a particular problem.
Thinking about any particular problem as typically being solved by a canonical set of data, but swimming in Morellian space of neglected detailed clues, also is useful in order to think about the structure of problems overall – and what happens as problems become more complex. That will be the subject of the next post.