Skip to content


Brandon in a comment to the last post in this series mentioned this text of Plato:

Well, at any rate, he said, justice has some resemblance to holiness; for anything in the world has some sort of resemblance to any other thing. Thus there is a point in which white resembles black, and hard soft, and so with all the other things which are regarded as most opposed to each other; and the things which we spoke of before as having different faculties and not being of the same kind as each other—the parts of the face—these in some sense resemble one another and are of like sort. In this way therefore you could prove, if you chose, that even these things are all like one another. But it is not fair to describe things as like which have some point alike, however small, or as unlike that have some point unlike. [Plato, Protagoras, 331d-e]

I would like to add this text from the modern Plato: Nelson Goodman:

Similarity, I submit, is insidious. And if the association here with invidious comparison is itself invidious, so much the better. Similarity, ever ready to solve philosophical problems and overcome obstacles, is a pretender, an impostor, a quack. It has, indeed, its place and its uses, but is more often found where it does not belong, professing powers it does not possess. [Goodman, 1972: 437]

We need to be very careful with likeness, similarity, resemblance and other (similar?) ideas when doing anything conceptually, because it is so very easy to find similarities. If you aren’t careful, you will make inferences based on your own dispositions about the natural world; this is anthropomorphism – making the world in your own image, and is what science must overcome to be science. It relies upon the ontological fallacy I have previously discussed.

What, if anything, is similarity? More importantly, what is a similarity relation? Given how much of natural classification depends upon it, at least how much is claimed for it, we must ask these questions early on.

In taxonomy, and I gather also in semantics, the similarity of one thing with another is roughly the Euclidean distance between them when they are mapped onto a semantic space. By this I mean (or think they mean) that one takes all the variables in play and sets up a dimension for them each. Then one applies the particular value of each thing as represented (say, as measured) and makes the sum of these values the coordinate of that thing in the space constructed from the dimensions.

Obviously, these dimensions can be of a very high number, so let us suppose there are only three variables. The “location” of A in that space is the ordered triplet <x, y, z> where each variable has a value. This is a Cartesian coordinate. Now if you have another object B and you want to know how similar (or inversely, how dissimilar) it is from A, you just measure the diagonal distance according to a formula sometimes called the Hamming Distance, which is the modulo of the value between them. Sometimes many dimensions are collapsed into Principal Components and the distance is the summary distance between these, but that’s a matter of tractability and convenience only.

Another way is to treat the axes as being discrete, and the similarity/dissimilarity is the minimum number of steps it takes to get from one to the other. These steps form graphs, and each coordinate is a node in that graph, for which reason I call it the node-edge definition; but it is a special case of the Hamming Distance version. It is also called the “nearest neighbour” metric or the “taxicab distance” or the “city block” distance, none of which affects us here. This has the advantage of being easier to compute and represent (because you can draw the network graph in two dimensions), and more realistic about how people often estimate similarity relations given that we tend to gather things into discrete classes. However, it is at best a psychologistic convention, and tells us nothing much about the natural similarity of things. This might indicate a problem with similarity itself, as Goodman pointed out.

Ironically, since we are trying to understand natural similarity and not psychologistic similarity, one of the clearest expositions of the similarity relation comes from Amos Tversky and his collaborators, developed in the field of psychology. In trying to work out how people identified multivariate forms, for example, faces, as similar, or semantic notions like “fork” and “spoon”, Tversky worked out the following, which I very abstractly describe.

Take a set of properties, a list of salient features. The similarity relation is the intersection of some subset of these properties that two objects have. This is a function of the mapping of parts of the sets onto each other, according to this formalism (Tversky and Gati 1977):


In short, one weights each set (the Greek letters represent the weighting), the set of A minus the unshared properties of B, the set of B minus the unshared properties of A, and subtract them from the weighted shared properties of both, and this gives you the similarity between the two objects A and B describe. The dissimilarity is the inverse of this: the two unshared sets minus the shared set. [I think I got that right. Tell me if I messed it up.] This is also called the feature contrast model of similarity, and as such ties nicely into the contrastive account of explanation (Lipton 1990, 1991) that I favour.

Several interesting things follow from this, which is, I believe, the best general definition of similarity on the market. One is that the degree of similarity is a function of the choice of salient features or properties. As we know, there are an infinitely large number of properties in common or potentially in common between any two objects. What we choose as salient will depend a lot upon us and our dispositions. Why, for example, is the square of the number of electrons in each object not used as a similarity metric? Because we do not have easy access to that information, and it is unclear how interesting that sort of similarity would be anyway. Still, the number, and its exponents, are facts about the objects. That we do not choose them as salient tells us more about ourselves than the objects, and a Laplacean Demon may find that number or class of numbers very important indeed, in ways we cannot envisage.

The second thing is that Tverskyan similarity tells us nothing that we did not put into the measure in the first place. It helps us understand what humans are doing (or computers if they employ this metric and method; say, when searching text for semantic similarities), but it isn’t extra information.

Now I have been very careful not to mention phenetics hitherto, but now is the time to do so. This school of thought, which was very popular under the name “numerical taxonomy”, arose with the rise of available computers in the 1960s and 1970s. It aimed to deliver “theory-free” taxonomies by the mechanical application of algorithms (Hamming-like) to plain and atheoretical data. It seemed like objectivity was finally in our grasp. However, the methods, while mathematically rigorous and useful in many contexts, did not deliver the desired atheoretical taxa (which they called “operational taxonomic units” or OTUs, to avoid prejudging ranks like species). Or rather, it delivered way too many; change the principal components and you got different taxa.

Moreover, the Hamming Distance metric requires that you arbitrarily choose a threshold value to delimit the clusters. So bacteriologists, for example, tended to choose a 70% similarity or clustering value, while other biologists selected a 90%, 95% or even 99% value. This arbitrariness again resolved down to our predilections.

Nevertheless, while Hamming similarity was not a good way to identify taxa, it was a great methodology for identifying and analysing clustering of various values, such as sequence similarity in molecular genetics, and the algorithms are part of every taxonomists and bioinformaticians’ toolkit today, quite rightly. What we need to understand is not what the metric is, but why it is useful and what it implies. “Phenetics” has become something of a dirty word these days in some circles, and that is a pity. It’s like saying that because we cannot find exact definitions in language, we have to impose them (which is, I fear, what some philosophers do indeed say when confronted with vagueness).

Phenetic classification, and its analogues in other sciences, are classification by analogy; rather sophisticated analogies, to be sure, but analogous reasoning nonetheless. We select what analogies to employ, and so we have loaded our inferences from the beginning. When such inferences are called for, that is not problematic. When we think we have discovered something about the natural world we didn’t already know, and all we have done is analyse our own dispositions, that is when the errors start to creep in.

Similarity is deductive. It doesn’t license inductive projectibility. It is not the foundation for inferences about history unless we can find causal inheritance – that is to say, identity relations or conservation relations. I will next discuss how one philosopher, Elliot Sober, has made what I think of as an error, in just this way.


Lipton, Peter. 1990. Contrastive Explanation. Royal Institute of Philosophy Supplements 27 (1):247-266.

Lipton, Peter. 1991. Contrastive Explanation and Causal Triangulation. Philosophy of Science 58 (4):687-697.

Tversky, Amos, and Itamar Gati. 1978. Studies of similarity. In Cognition and categorization, edited by E. Rosch and B. B. Lloyd. Hillsdale, NJ: Lawrence Erlbaum Associates:79-98.


  1. Most interesting, John: “As we know, there are an infinitely large number of properties in common or potentially in common between any two objects.” Yes.

    As you know, I’ve been thinking about memes recently, and it looks like my view is that memes are something like cultural conventions about the dimensions along which similarity is to be assessed for a given class of objects. I seem to have settled into two paradigm cases, the phoneme system and Rhythm Changes. The phoneme system says that certain aspects of the speech stream are linguistically important while others are not. Two sounds are phonemically the same if they share the same features from among the selected set. Similarly, a tune is said to be based on Rhythm Changes if it has an appropriate set of features.

  2. An interesting post.

    The way that philosophy uses “similarity” bothers me, perhaps for reasons not unlike those you quoted from Nelson Goodman. It seems to me that philosophy has this backwards. According to philosophy, we use similarity in order to organize the world (arrange into categories for example). But it seems to me that we first organize the world as best we can, and then use that organization to decide when to consider things to be similar.

    • John S. Wilkins John S. Wilkins

      Nice. Thanks.

  3. You know, John, there’s just a whiff of idealism in all this. You seem in need of some being, whether a human or a Laplacean Demon, to observe the objects in the universe and remark on their properties. Is there a way of getting rid of that? After all, the universe does exist independently of observers, no?

    Could we, for example, say that two objects, A and B, are similar to one another with respect to object Z, if interactions between A and Z are identical to interactions between B and Z?

    • John S. Wilkins John S. Wilkins

      A philosopher cannot merely assert the existence of the observer-independent world, but in the philosophy of science we can presume that such a world exists and that it has structure, and leave the deeper question for the metaphysicians.

      I am concerned in this series to understand how natural classification is done in the sciences. If the world has structure, then natural classification has to reflect this, and aid us in making further inferences about that structure. The traditional way we have seen this enterprise is through hypothesis and testing; I am merely observing that there is another way to do this, through classification of phenomena in ways that are not immediately theoretical. The rest is a matter of detail.

      Since our epistemology must be fallibilistic, and we are limited in our capacities to acquire, process and order data, and yet we manage well enough in many areas, I don’t see that I have committed any idealistic mistakes here. Laplace’s Demon is merely there for contrast.

  4. Jim Thomerson Jim Thomerson

    As a learned and better known than me colleague remarked, “A similarity is only a similarity. A difference is really a difference.”

    • John S. Wilkins John S. Wilkins

      Did he know how to tell them apart? If so, let me know; for it’s the core problem here…

  5. And, of course, making similarity judgments is intimately bound up with the problem of describing things. And that’s a very interesting problem, one that I’ve got an active interest in, though not in the general case. I’m interested in how we describe literary texts and I’m convinced that we do it badly. In fact, we do it so badly that that, if literary studies is to advance, we have to improve our descriptions. Nothing else much matters, without better descriptions the whole business is severely limited.

    What’s so difficult about describing texts? Things like rhyme schemes and meter, we can do that. Though we don’t do enough of it & even here there are subtleties to worry about.

    But, it’s not just the physical text, the ink on the page or the sound in the air, that I want to describe. It’s something a bit more elusive. It’s patterns of meaning. The problem there is to come up with descriptions that don’t go down the rat-hole of trying to figure out what the bloody thing means. Because that’s endless and, ultimately, subjective.

    And so forth and so on.

    I note that comparing texts can be useful, where you’re looking for similarities and differences. The act of comparison draws your attention to certain features. Depending on the comparison, you may be interested in the differences that emerge from the comparison, or the similarities. Thus, when comparing Tezuka’s Metropolis, a narrative in manga form, with Coleridge’s “Kubla Khan,” a 54-line lyric poem, it’s the similarities that come to the fore: 1) center symmetric construction, and 2) emblematic motif at the end. When comparing Green’s Pandosto, a novella, with Shakespeare’s The Winter’s Tale, which is based on Pandosto, its the differences that come to the fore.

  6. sbej sbej

    Ive been using the ugly duckling theorem to detect the ‘pattern of meaning’ in narrative; although I had no idea it had a name. It can at least give some idea of how themes and motifs were classed, clustered together and interpreted in a given historic period.

    It also helps to identify how diffrences develop and new themes are introduced.

    Worrying that I was simply piecing together surface similarity has been a constant concern for me. Its paid off and I can now make a solid identification, as I have a very clear long term history; but in the early days just had to live with the fact it was high risk and may come to nothing.

Comments are closed.