When Gould meets Galton

A.W.F. Edwards

  • The Mismeasure of Man by Stephen Jay Gould
    Norton, 352 pp, £9.95, May 1982, ISBN 0 393 01489 4

Modern evolutionary biology seems prone to idle argument and useless controversy, as if it had an urge to experience once again the exciting atmosphere of the Darwinists v. the Creationists, or the Mendelians v. the Biometricians; or perhaps a longing to experience the ecstasy of the physicist, the Maxwell or the Einstein, whose new theory so splendidly and triumphantly succeeds. The trouble probably stems from the fact that the really successful theories of biology in recent decades have been biochemical and molecular, leaving the natural historian who tries to move from a specific area of study to the construction of general hypotheses with little to argue about. So the population geneticists set to with fierce debate about whether the majority of mutations are neutral in their selective effect, the taxonomists classify themselves as cladists or evolutionary systematists and hurl insults at each other, and the evolutionary biologists split into sociobiologists and the rest – separate species, we may suppose, since intercourse between the two has not so far proved fruitful.

Stephen Gould’s contribution to this last debate is to open one or two coffins containing the scientific skeletons of the past with the purpose of nailing down the lids even more securely. From the point of view of the modern debate, they were better left undisturbed, but The Mismeasure of Man is fine history, and relevant to the present-day controversy insofar as it explains its social background: for example, I now appreciate far better than hitherto why it is that Americans, in particular, argue so passionately (and so rarely dispassionately) about IQ testing.

The first half of The Mismeasure of Man chronicles 19th-century anthropometry, especially craniometry, and the naive conclusions that were sometimes reached. It is not difficult to do a demolition job in this field, and Gould revels in it, examining the corpses in macabre detail. Did you know that some enthusiast actually got hold of Gauss’s brain and found it to weigh 1492 grams?

Gould submits some of the old data to a crushing re-examination, though – without wishing to condone the views to which the erroneous conclusions led – I myself incline to a more charitable view of those involved. They were caught up in the revolutionary introduction of statistical procedures into biological thinking, and not only were the available methods primitive, but, as Galton was quick to point out, they had been developed in the physical sciences where the statistical ‘error’ was something whose very existence was to be regretted and which the methods therefore sought to eliminate, whilst in the biological sciences the ‘error’ was the very variability under study. The same mathematics underlay different models: to the physicist, the Normal curve is a distribution of observational error; to the biologist, it is a distribution describing the variate under study – height, weight, even brain size.

It is indeed when Gould meets Galton that the reader first detects a prejudice on the part of the author. The section is headed ‘Francis Galton – apostle of quantification’, and it is worth giving almost in its entirety (save for the quotations from Galton, indicated by dots):

No man expressed his era’s fascination with numbers so well as Darwin’s celebrated cousin, Francis Galton (1822-1911). Independently wealthy, Galton had the rare freedom to devote his considerable energy and intelligence to his favourite subject of measurement. Galton, a pioneer of modern statistics, believed that, with sufficient labour and ingenuity, anything might be measured, and that measurement is the primary criterion of a scientific study. He even proposed and began to carry out a statistical inquiry into the efficacy of prayer! Galton coined the term ‘eugenics’ in 1883 and advocated the regulation of marriage and family size according to hereditary endowment of parents.

  Galton backed his faith in measurement with all the ingenuity of his idiosyncratic methods. He sought, for example, to construct a ‘beauty map’ of the British Isles in the following manner: ...

  With good humour, he suggested the following method for quantifying boredom: ...

  Quantification was Galton’s god, and a strong belief in the inheritance of nearly everything he could measure stood at the right hand. Galton believed that even the most socially-embedded behaviours had strong innate components: ... Constantly seeking new and ingenious ways to measure the relative worth of peoples, he proposed to rate blacks and whites by studying the history of encounters between black chiefs and white travellers: ...

  Galton’s major work on the inheritance of intelligence (Hereditary Genius, 1869) included anthropometry among its criteria, but his interest in measuring skulls and bodies peaked later when he established a laboratory at the International Exposition of 1884. There, for threepence, people moved through his assembly line of tests and measures, and received his assessment at the end. After the Exposition, he maintained the lab for six years at a London museum. The laboratory became famous and attracted many notables, including Gladstone: ...

  Lest this be mistaken for the harmless musings of some dotty Victorian eccentric, I point out that Sir Francis was taken quite seriously as a leading intellect of his time ...

That is surely a thoroughly tendentious portrait of Galton, down to the use of the past tense in the last sentence quoted. It might have been supposed that the author of a book full of references to correlation, regression (including multiple regression), principal component analysis, and factor analysis, who devotes some space to Galton, would have mentioned that this whole statistical apparatus stems from Galton’s penetrating work on the bivariate normal distribution and linear regression.

Gould’s enthusiasm for reworking other people’s data in the hunt for their statistical fallacies finally runs away with him over the question of brain size. Consider the awful possibility that big brains might harbour big intelligences: for one thing, women have, on average, smaller brains than men. Notwithstanding the fact that there is really no evidence that brain size is associated with any particular brain function, would it not be doubly safe to sever the second link in the chain of reasoning by demonstrating that male and female brain sizes did not, after all, differ?

Sizes of brains are related to the sizes of bodies that carry them: big people tend to have larger brains than small people. This fact does not imply that big people are smarter – any more than elephants should be judged more intelligent than humans because their brains are larger. Appropriate corrections must be made for differences in body size. Men tend to be larger than women; consequently, their brains are bigger. When corrections for body size are applied, men and women have brains of approximately equal size.

The data to be ‘explained’ were collected by Broca a hundred years ago. From autopsies in four Paris hospitals Broca assembled a series of 292 male brains with average weight 1325 grams and 140 female brains with average weight 1144 grams, a difference of 181 grams.

Since Broca recorded height and age as well as brain size, we may use modern statistical procedures to remove their effect. Brain weight decreases with age and Broca’s women were, on average, considerably older than his men at death. Brain weight increases with height, and his average man was almost a foot taller than his average woman. I used multiple regression, a technique that permits simultaneous assessment of the influence of height and age upon brain size. In an analysis of the data for women, I found that, at average male height and age, a woman’s brain would weigh 1212 grams. Correction for height and age reduces the 181 gram difference by more than a third to 113 grams.

The ghost of Galton to the rescue! But it is a poor sort of rescue that leaves two-thirds of the odious account unsettled. Undaunted, Gould continues with two paragraphs whose tenor is that further progress is impeded by the fact that ‘modern students of brain size have still not agreed on a proper measure to eliminate the powerful effect of body size’ (here there is a reference to Gould, 1975, which is unfortunately omitted from the bibliography). Then the triumphant conclusion: ‘Thus, the corrected 113 gram difference is surely too large; the true figure is probably close to zero and may as well favour [note the word] women as men. One hundred and thirteen grams, by the way, is exactly the average difference between a five-foot four-inch and a six-foot four-inch male in Broca’s data – and we would not want to ascribe greater intelligence to tall men. In short, Broca’s data do not permit any confident claim that men have bigger brains than women.’

Gould’s abuse of multiple regression to press home his attack on other people’s analyses removes, at a stroke, his mask of dispassionate observer. He is hoist with his own petard. He has failed to distinguish between the two different questions that a multiple regression analysis can help us answer. On the assumption that the male and female samples are reasonably representative of the wider populations from which they were drawn (and not, for example, selected by hat size!), and on the further assumption that height, age and brain sizes are, for each sex, linearly related, as near as makes no difference, the two questions are these: 1. Is sex irrelevant in predicting the brain size of a person of given age and height drawn from the same population as the samples? 2. Is it likely that the male and female populations from which the samples were drawn differed in mean brain size?

Now since ‘Broca’s women were, on average, considerably older than his men at death,’ it is reasonable to allow for the age differences in the samples by means of multiple regression if there is no reason to suppose that this difference is a reflection of the population at large: in other words, if, in respect of age, the samples were not representative. Of course women do live longer than men, but I charitably assume that ‘considerably older’ means that Broca’s difference was much greater than this natural difference, and that the samples were therefore biased for some reason. This argument applies to both questions 1. and 2. But height is a different matter, since it is not suggested that, in respect of height, either the male or the female samples were unrepresentative of men and women. True, the men were not representative of the women, or vice versa, but why should they be?

In answering question 1., however, it is nevertheless justifiable to use multiple regression to ‘eliminate’ the effect of height on brain size because of the very special nature of the question. Indeed, it might even have been that the heights of men and women were so different that you could tell the sex of a person from his height alone, in which case the additional information about the sex of the person would be redundant and the answer to question 1. would be ‘yes’.

But of course Gould’s real question is 2., not 1. For suppose intelligence really were proportional to brain size, that, adjusted for height as in answering question 1., sex really were irrelevant to brain size, but that the women were shorter than the men, then there would be no escaping the observation that women were, on average, less intelligent than men, and the fact that multiple regression had ‘explained’ this as being ‘due to’ their shortness is neither here nor there, because shortness is an intrinsic characteristic of women. In seeking ‘a proper measure to eliminate the powerful effect of body size’, Gould is unconsciously seeking to eliminate the effect of sex itself. Thus he goes on: ‘Height is partly adequate, but men and women of the same height do not share the same body build’ (my italics).

This story is a perfect example of the deep water surrounding questions of statistical inference. In this case the conclusion about an individual (question 1.) is quite different from the conclusion about the population (question 2.). In the first answer we allow for the influence of height, even though it be highly correlated with sex, because the question is so framed as to force us to, but in the second answer it is ludicrous to ‘correct’ for the very distinction we are studying. As long ago as 1957, H. Fairfield Smith (in Biometrics) gave the standard riposte on adjusted means: ‘Such inferences seem analogous to saying that the difference between the observed heights of Mt Everest and Pike’s Peak is “due to” air density and is “exaggerated”, implying that it is false because the difference adjusted for correlation of altitude with atmospheric density would be negligible.’

The second half of The Mismeasure of Man deals with IQ testing, first the American experience and then ‘the Real Error of Cyril Burt’. The American experience (‘the hereditarian theory of IQ is a home-grown American product’) is a harrowing reminder of the dangers of allowing half-baked social theorising to influence policy, whilst the study of Cyril Burt and factor analysis is not only intriguing history but an excellent introduction to the mysterious manipulations of the factor analysts. This time the petard does not explode, and I find Gould’s case that these multivariate manipulations produce nothing but artefacts convincing, even allowing that his is a one-sided account.

In a final chapter Gould tries to salvage ‘A Positive Conclusion’ from the wreckage he has wrought. But positive conclusions require data, and Gould offers none save a reference to a paper by Lewontin (not listed in the bibliography) which indicates ‘that the overall genetic differences between human races are astonishingly small.’ On this basis, Gould concludes his book with a number of assertions about the minimal influence of genetical as compared with cultural factors in explaining recent human progress and the variety of man. Such a conclusion is far better arrived at, however, from a study of the facts (see Genetics, Evolution and Man by W.F. Bodmer and L.L. Cavalli-Sforza) than from the demolition of old analyses: we learn other lessons from that activity.

The proneness of evolutionary biology to profitless controversy, apparently due, as I have noted above, to the lack of strong new hypotheses of widespread generality, is well-illustrated by The Mismeasure of Man. For what is the scientific argument really about? It is simply not the case that there are two credible hypotheses with which to explain ourselves – nature and nurture. No scientist, Gould included, believes that. Nor is it even the case that there is a general mixed hypothesis of wide applicability which allows us to argue about the degrees of nature and nurture. The truth seems to be unquantifiable in such general terms, and each characteristic in each organism, from human intelligence to drosophila bristle-counts, requires its own analysis, its own theory. Darwin and Mendel provided as much general theory as we are going to see.

I have a parting worry about human intelligence. ‘I do not doubt,’ writes Gould, ‘that natural selection acted in building our oversized brains.’ Nor do I. But if it did (and it did not take very long either), there must have been a substantial reservoir of genetical variability on which it acted, for without such variability selection cannot cause change. Where has all this variability gone?