Well, at any rate, he said, justice has some resemblance to holiness; for anything in the world has some sort of resemblance to any other thing. Thus there is a point in which white resembles black, and hard soft, and so with all the other things which are regarded as most opposed to each other; and the things which we spoke of before as having different faculties and not being of the same kind as each other—the parts of the face—these in some sense resemble one another and are of like sort. In this way therefore you could prove, if you chose, that even these things are all like one another. But it is not fair to describe things as like which have some point alike, however small, or as unlike that have some point unlike. [Plato, Protagoras, 331d-e]
I would like to add this text from the modern Plato: Nelson Goodman:
Similarity, I submit, is insidious. And if the association here with invidious comparison is itself invidious, so much the better. Similarity, ever ready to solve philosophical problems and overcome obstacles, is a pretender, an impostor, a quack. It has, indeed, its place and its uses, but is more often found where it does not belong, professing powers it does not possess. [Goodman, 1972: 437]
We need to be very careful with likeness, similarity, resemblance and other (similar?) ideas when doing anything conceptually, because it is so very easy to find similarities. If you aren’t careful, you will make inferences based on your own dispositions about the natural world; this is anthropomorphism – making the world in your own image, and is what science must overcome to be science. It relies upon the ontological fallacy I have previously discussed.
What, if anything, is similarity? More importantly, what is a similarity relation? Given how much of natural classification depends upon it, at least how much is claimed for it, we must ask these questions early on.
In taxonomy, and I gather also in semantics, the similarity of one thing with another is roughly the Euclidean distance between them when they are mapped onto a semantic space. By this I mean (or think they mean) that one takes all the variables in play and sets up a dimension for them each. Then one applies the particular value of each thing as represented (say, as measured) and makes the sum of these values the coordinate of that thing in the space constructed from the dimensions.
Obviously, these dimensions can be of a very high number, so let us suppose there are only three variables. The “location” of A in that space is the ordered triplet <x, y, z> where each variable has a value. This is a Cartesian coordinate. Now if you have another object B and you want to know how similar (or inversely, how dissimilar) it is from A, you just measure the diagonal distance according to a formula sometimes called the Hamming Distance, which is the modulo of the value between them. Sometimes many dimensions are collapsed into Principal Components and the distance is the summary distance between these, but that’s a matter of tractability and convenience only.
Another way is to treat the axes as being discrete, and the similarity/dissimilarity is the minimum number of steps it takes to get from one to the other. These steps form graphs, and each coordinate is a node in that graph, for which reason I call it the node-edge definition; but it is a special case of the Hamming Distance version. It is also called the “nearest neighbour” metric or the “taxicab distance” or the “city block” distance, none of which affects us here. This has the advantage of being easier to compute and represent (because you can draw the network graph in two dimensions), and more realistic about how people often estimate similarity relations given that we tend to gather things into discrete classes. However, it is at best a psychologistic convention, and tells us nothing much about the natural similarity of things. This might indicate a problem with similarity itself, as Goodman pointed out.
Ironically, since we are trying to understand natural similarity and not psychologistic similarity, one of the clearest expositions of the similarity relation comes from Amos Tversky and his collaborators, developed in the field of psychology. In trying to work out how people identified multivariate forms, for example, faces, as similar, or semantic notions like “fork” and “spoon”, Tversky worked out the following, which I very abstractly describe.
Take a set of properties, a list of salient features. The similarity relation is the intersection of some subset of these properties that two objects have. This is a function of the mapping of parts of the sets onto each other, according to this formalism (Tversky and Gati 1977):
In short, one weights each set (the Greek letters represent the weighting), the set of A minus the unshared properties of B, the set of B minus the unshared properties of A, and subtract them from the weighted shared properties of both, and this gives you the similarity between the two objects A and B describe. The dissimilarity is the inverse of this: the two unshared sets minus the shared set. [I think I got that right. Tell me if I messed it up.] This is also called the feature contrast model of similarity, and as such ties nicely into the contrastive account of explanation (Lipton 1990, 1991) that I favour.
Several interesting things follow from this, which is, I believe, the best general definition of similarity on the market. One is that the degree of similarity is a function of the choice of salient features or properties. As we know, there are an infinitely large number of properties in common or potentially in common between any two objects. What we choose as salient will depend a lot upon us and our dispositions. Why, for example, is the square of the number of electrons in each object not used as a similarity metric? Because we do not have easy access to that information, and it is unclear how interesting that sort of similarity would be anyway. Still, the number, and its exponents, are facts about the objects. That we do not choose them as salient tells us more about ourselves than the objects, and a Laplacean Demon may find that number or class of numbers very important indeed, in ways we cannot envisage.
The second thing is that Tverskyan similarity tells us nothing that we did not put into the measure in the first place. It helps us understand what humans are doing (or computers if they employ this metric and method; say, when searching text for semantic similarities), but it isn’t extra information.
Now I have been very careful not to mention phenetics hitherto, but now is the time to do so. This school of thought, which was very popular under the name “numerical taxonomy”, arose with the rise of available computers in the 1960s and 1970s. It aimed to deliver “theory-free” taxonomies by the mechanical application of algorithms (Hamming-like) to plain and atheoretical data. It seemed like objectivity was finally in our grasp. However, the methods, while mathematically rigorous and useful in many contexts, did not deliver the desired atheoretical taxa (which they called “operational taxonomic units” or OTUs, to avoid prejudging ranks like species). Or rather, it delivered way too many; change the principal components and you got different taxa.
Moreover, the Hamming Distance metric requires that you arbitrarily choose a threshold value to delimit the clusters. So bacteriologists, for example, tended to choose a 70% similarity or clustering value, while other biologists selected a 90%, 95% or even 99% value. This arbitrariness again resolved down to our predilections.
Nevertheless, while Hamming similarity was not a good way to identify taxa, it was a great methodology for identifying and analysing clustering of various values, such as sequence similarity in molecular genetics, and the algorithms are part of every taxonomists and bioinformaticians’ toolkit today, quite rightly. What we need to understand is not what the metric is, but why it is useful and what it implies. “Phenetics” has become something of a dirty word these days in some circles, and that is a pity. It’s like saying that because we cannot find exact definitions in language, we have to impose them (which is, I fear, what some philosophers do indeed say when confronted with vagueness).
Phenetic classification, and its analogues in other sciences, are classification by analogy; rather sophisticated analogies, to be sure, but analogous reasoning nonetheless. We select what analogies to employ, and so we have loaded our inferences from the beginning. When such inferences are called for, that is not problematic. When we think we have discovered something about the natural world we didn’t already know, and all we have done is analyse our own dispositions, that is when the errors start to creep in.
Similarity is deductive. It doesn’t license inductive projectibility. It is not the foundation for inferences about history unless we can find causal inheritance – that is to say, identity relations or conservation relations. I will next discuss how one philosopher, Elliot Sober, has made what I think of as an error, in just this way.
Lipton, Peter. 1990. Contrastive Explanation. Royal Institute of Philosophy Supplements 27 (1):247-266.
Lipton, Peter. 1991. Contrastive Explanation and Causal Triangulation. Philosophy of Science 58 (4):687-697.
Tversky, Amos, and Itamar Gati. 1978. Studies of similarity. In Cognition and categorization, edited by E. Rosch and B. B. Lloyd. Hillsdale, NJ: Lawrence Erlbaum Associates:79-98.