A decade ago, L.S. Hearnshaw’s Cyril Burt, Psychologist (1979) apparently resolved one of recent psychology’s most publicised controversies. Previously at issue had been the question of whether some discredited findings in a study of separated identical twins by the eminent psychologist Sir Cyril Burt (1883-1971) had been the product of simple carelessness in an ageing but honest investigator, or of deliberate fraud. After carefully examining Burt’s private papers, Hearnshaw concluded in favour of fraud, and most of the psychological community quickly accepted his judgment. Now, however, Robert Joynson has re-examined the case and decided, in the words of The Burt Affair’s dust-jacket, ‘that the accusations are ill-founded and that Burt must be exonerated.’
The case began when Burt, during an active retirement following a distinguished career at University College London, periodically reported on an ever-growing study of identical twins said to have been separated from birth. In 1966 his sample had grown to 53 twin pairs, the largest in the literature. More significantly still, Burt also claimed then that his twins had been reared in totally uncorrected socio-economic environments. In all other studies, the ‘separated’ twins had usually been raised in similar foster homes, often two branches of the same adopting family. Thus they had shared roughly similar environments along with their identical genes, so their often-striking resemblances as adults could be at least partly attributed to their nurture. But when Burt reported intelligence test correlations close to +0.80 for his large sample of randomly-placed twins – a correlation of +1.0 would indicate a perfect resemblance – this seemed powerful evidence indeed for the great heritability of intelligence, and its relative imperviousness to environmental differences.
Shortly after Burt’s death, however, the American psychologist Leon Kamin scrutinised the whole series of Burt’s twin reports from their inception in 1943, and found numerous ambiguities and inadequacies in his descriptions both of the twins and the tests used to measure them. Kamin further noted that some of Burt’s reported correlations remained identical to the third decimal place even as the samples they were based on increased in size – a virtual statistical impossibility. When erstwhile Burt admirer Arthur Jensen forth-rightly confirmed and elaborated Kamin’s revelation, most psychologists agreed with his concession that Burt’s results were ‘useless for hypothesis testing’. Furious debate ensued, however, after Oliver Gillie alleged in a 1976 Sunday Times article that ‘J. Conway’ and ‘M. Howard’ – two women whom Burt had frequently cited as his collaborators and co-authors – were actually pseudonyms for Burt himself. Gillie went on explicitly to suggest that Burt’s studies may have been the result of deliberate fraud rather than of an old man’s unintentional carelessness – a thought that had previously crossed a few minds but had not yet been expressed in print.
Then Hearnshaw’s biography reported that Burt’s private papers contained no evidence of any contact with twins, or with anyone named Conway or Howard, during his retirement. One diary entry did indicate, however, that Burt had spent an entire week ‘calculating data on twins’ in response to a simple request for some raw data from his 1966 study: this suggested that he had had to create scores from scratch that would conform to his published correlations. Equally damaging, Hearnshaw presented voluminous evidence suggesting that Burt had behaved dishonestly in several ways during the latter part of his career: that he had altered manuscripts of students and junior co-authors so as to make them more favourable to himself upon publication; that he had systematically mis-represented the history of the statistical technique known as ‘factor analysis’ so as to minimise the contributions of his early mentor Charles Spearman while falsely accentuating his own; and that he had systematically abused his position as editor of the British Journal for Statistical Psychology. Hearnshaw concluded that Burt might have been an honourable person in his prime, but that he had unquestionably ‘gone bad’ in old age. All except a handful of Burt’s former students now accepted that he had cheated.
Now, however, Joynson has taken up and elaborated upon these few loyalists’ arguments in The Burt Affair. Essentially a detailed commentary on Hearnshaw, this book will not be easily followed by readers unfamiliar with the 1979 biography. Still, it is good to have such a lengthy defence on the record, for Joynson rightly argues that Burt was not around to answer his accusers himself, and that his small band of supporters received less opportunity to speak out than his detractors. Joynson’s arguments are varied and often complexly detailed, so different readers will undoubtedly evaluate the overall success of his case differently.
For this reader, Joynson succeeds best in challenging or correcting Hearnshaw on some points of detail regarding the history of factor analysis. Both agree that Burt played a genuinely important role in that history by proposing the existence of ‘group factors’ in intelligence – clusters of intercorrelated abilities intermediate in generality between the all-pervasive ‘general intelligence’ and the highly particular ‘specific factors’ originally postulated by Spearman. But Hearnshaw argues that Burt’s work clearly derived from Spearman, and brands as blatantly false Burt’s own late accounts of its origin. After 1947, Burt repeatedly claimed that the true mathematical foundations of factor analysis lay not in Spearman but in some mathematical ideas of Karl Pearson, as first applied by Burt himself to psychological variables. Noting that Burt did not even mention Pearson in his early papers, Hearnshaw interprets Burt’s historical claim as a self-serving lie.
Joynson, by contrast, accepts as plausible Burt’s autobiographical statement that he first obtained Pearson’s idea from a lecture rather than a published paper, which would account for the lack of formal citation. Upon closely analysing Burt’s early papers, Joynson finds hints of mathematical techniques that – contrary to Hearnshaw’s representations of them – do in fact resemble Pearson’s and deviate from Spearman’s. While conceding that Burt perhaps ‘improves upon what actually happened, to make the connections clearer ... and so tidy up his story’, Joynson still insists that ‘this does not make Burt a deliberate liar.’ I believe that Joynson here raises at least a reasonable doubt, and on the principle of presumed innocence would now tone down the charge that Burt deliberately and dishonestly misrepresented the history of factor analysis.
On a related issue, however, substantial evidence of culpability remains. Most of Burt’s historical writing on factor analysis – and on much else besides – appeared in the British Journal of Statistical Psychology, an organ of the British Psychological Society with whose editorship he was entrusted in 1947. Joynson agrees with Hearnshaw that Burt frequently published his own papers here under assumed names, including Conway and Howard. He does not challenge Hearnshaw’s statement that more than half of the named authors of notes, reviews and letters appearing in the journal were pseudonyms for Burt himself. But Joynson seems to minimise the importance of this behaviour, quoting one of Burt’s loyal students to the effect that he was merely being ‘shy and modest’ about the extent of his own contributions. Joynson also quotes Burt himself: ‘No one thinks the worse of Daniel, Job or Ecclesiastes because they were not written by their ostensible authors.’
But consider just two typical examples of the way Burt really operated. After publishing a lengthy 1958 paper as ‘J. Conway’ on ‘The Inheritance of Intelligence and its Social Implications’, he received a short critique of the work’s hereditarian thesis from the environmentalist A.H. Halsey. This Burt published in early 1959 as ‘Class Differences in Intelligence I: A Reply to Miss Conway’. Immediately following Halsey’s four-page paper, he printed ‘Class Differences in Intelligence II: A Reply to Dr Halsey’ – a ten-page rejoinder ostensibly by Conway. Next came ‘Class Differences in Intelligence III’, 19 further pages of rebuttal only this time under Burt’s name.
The second example dates from 1963, when John McLeish’s book The Science of Behaviour charged (almost a decade before Kamin) that Burt had inadequately described his twin studies, and called his research methodology ‘shocking’. Burt promptly ran a scathing and nitpicking review, three times longer than standard reviews in the journal, which did not fail to include several gratuitous comments about his own importance in the history of factor analysis. He signed the review ‘M. Howard’.
Pseudonymous papers such as these hardly compare with the books of Daniel, Job or Ecclesiastes – nor are they the works of a shy and modest man. Given the importance of archival journals both for their disciplines and for the careers of authors who publish (or fail to publish) in them, their editors are entrusted with great responsibility. Burt grossly and dishonestly abused that responsibility, and Hearnshaw’s general point that he behaved dishonourably on important projects other than his twin studies seems amply confirmed.
Regarding the twin study, Joynson asserts that Hearnshaw’s charge of fraud rests mainly on inconclusive negative evidence: the failure to find positive indications of an ongoing study. He dismisses one apparent positive sign of fraud – the entire week Burt says he spent ‘calculating data on twins’ – by suggesting that Burt ‘might have had to look through many sheets of material’ to obtain his figures. But Hearnshaw surely might reply that it shouldn’t have taken that many sheets, and ask why a simple request for the raw IQ scores and socio-economic ratings for 106 twins should have required ‘calculation’ if the data had already been legitimately collected and recorded.
Joynson accepts that Burt in retirement collected no new data, and had no contact with Conway or Howard. He does believe (and Hearnshaw concurs) that the two women had been real students of Burt much earlier, however, and his scenario for exoneration has the retired Burt finding some previously lost and unanalysed twin data collected by them before the war. Presumably misplaced by his secretary, whose feelings Burt tried to spare by ambiguously wording his published descriptions of the twins, mese rediscovered cases supposedly got added to Burt’s original sample in a genuine enlargement. But while some such explanation might account for a single expansion in Burt’s reported sample size, it defies plausibility that several caches of mislaid data would be successively uncovered, leading to? Burt’s reported sample sizes of 21 pairs in 1955, ‘over thirty’ in 1957, 42 in 1958 and 53 in 1966. Thus while the case for fraud in the twin study may indeed rest partially on negative and circumstantial evidence, it still strikes me as considerably more plausible than the ‘innocent’ alternative suggested by Joynson.
A final word is in order regarding the quality of Burt’s twin studies as he presented them, quite independently of the issue of fraud. Joynson acknowledges that these ‘are open to many legitimate criticisms’, so that it would ‘be unwise to place too much confidence in Burt’s conclusions’. He adds, in partial exoneration, that since most of the actual data collection occurred during an early and scientifically unsophisticated time, Burt’s study ‘reflects many of the drawbacks of pioneering work’.
But this excuse fails to convince, because Burt was well-acquainted with another, very excellent separated-twin study that had been published long before his own first report, and that remains a classic today. At Chicago in the Thirties, Horatio Newman, Frank Freeman and Karl Holzinger laboriously collected a sample of 19 separated twin pairs, and reported detailed findings in their 1937 book, Twins: A Study of Heredity and Environment. This study used several pioneering tests and techniques that have long since been superseded, including the relatively primitive 1916 version of the Stanford-Binet Intelligence Scale. But the study still retains extraordinary interest and value, for a simple reason: it provides detailed and fully documented case-histories for each twin pair, including background on their foster homes, their degrees of contact with each other prior to the study, and even their photographs. Besides providing fascinating reading in their own right, these descriptions enable curious readers to probe considerably beyond the statistical summaries of results. For example, while the authors reported a healthy IQ correlation of +0.67 for the twins, inspection of the full case-studies shows that those few pairs who had been most fully separated – i.e. reared in genuinely dissimilar and unrelated families – and who had had little or no contact with each other while growing up, showed the largest differences in their IQs. This strongly suggests that the average IQ differences would have been larger and the correlation lower in a scientifically ‘ideal’ study, where all of the genetically identical twins had been reared in truly independent environments.
Burt’s twin reports never contained such useful details, and for that reason they were largely ignored by experts until 1966. Thus when James Shields published his major study in 1962, Monozygotic Twins Brought Up Apart and Brought Up Together, he discussed the Chicago study at length while dismissing Burt’s work in a few words. He called Burt’s results ‘comparable with those of Newman’, but completely omitted Burt’s sample from his statistical summary of earlier twin work because of the lack of information available about it. Shields’s own study resembles Newman’s in the interest and richness of its detail; and since the ‘separation’ of his twins was as varied as for Newman’s, the interpretation of his IQ correlation of +0.77 was equally ambiguous. Thus even as Burt-Conway-Howard was reporting increasing sample sizes, there was little reason for anyone to pay particular attention to his study. His reported statistical results were in the same range as the others, and since he provided no further information about his sample, there was nothing there to interest anyone else. In the absence of detail, Burt’s readers naturally assumed that his twins had the same non-random placement as Newman’s and Shields’s.
This abruptly changed with his 1966 claim – again in a bare statistical table with no flesh-and-blood detail – that his twins, unlike everyone else’s, had been nearly ideally separated for a test of nature versus nurture. Now other investigators began asking him for details, including the raw IQs and socio-economic ratings that he had to spend a week ‘calculting’. He answered most other requests evasively or not at all. Given the inherent improbability of his claim – since one expects and indeed hopes that adoption agencies would place twins in similar and related families if at all possible – questions about the scientific value of Burt’s ‘study had begun to circulate privately even before Kamin called attention to the recurrent correlations. They would undoubtedly have increased even without his revelations, and Burt’s study would eventually have been dismissed. Had Burt not made his startling claim in 1966, therefore, his study would probably have sunk into richly deserved scientific oblivion after his death.