Why statistics tend not only to describe the world but to change it

Lorraine Daston

  • The Politics of Large Numbers: A History of Statistical Reasoning by Alain Desrosières, translated by Camille Naish
    Harvard, 368 pp, £27.95, October 1998, ISBN 0 674 68932 1

Is the Gross Domestic Product real? How about the unemployment rate? Or the population of the United Kingdom? These are entities that hover between the realms of the invented and the discovered. On the one hand, they are creatures of classification and calculation, of conventions of coding, modelling and sampling. It is the artifice of definition that makes them cohere – or unravel. Depending on the method by which GDP is reckoned (e.g. sum total of all rents, wages, profits, interest and dividends as against national expenditure on goods and services), or how unemployment is defined (without a job? without a job and actively seeking one?), different numbers result. More disturbingly for common-sense realism, these entities come into being (and sometimes pass away) in specific historical circumstances that don’t satisfy the usual criteria of solidity and permanence for bona fide things in the world. Why did the category of ‘unemployment’ supersede that of ‘poverty’ around the turn of this century? And why are international population and medical statistics so hard to standardise? On the other hand, statistical entities are robust and vigorous: GDP and the unemployment rate guide government planning and affect the outcome of elections. Actuarial statistics, which correlate the risk of car accidents primarily with horsepower in Germany but with the age and sex of the driver in the United States, set the price of policies and are the foundation of vast fortunes for insurance companies. The categories of national labour statistics are only approximately translatable, even among the homogenised economies of the European Community: to be a Beamter or cadre or manager is not just to assume a label, but to take on a distinct persona. If they are not part of the durable furniture of the world, the same everywhere and always, statistical entities nonetheless change the world. They are, as Alain Desrosières puts it, ‘things that hold’.

Desrosières’s book is a philosophical and sociological reflection on the history of statistics, the most matter-of-fact of all disciplines. He draws liberally on the dernier cri sociology of science (in the mode of Bruno Latour and Michel Callon), on medieval philosophy (he is especially enamoured of the 14th-century debates between realists and nominalists), and on standard political histories of Britain, France, Germany and the US in order to retell the convoluted story of the science and practice of modern statistics and how the qualitative descriptions used by early modern states to keep track of their subjects and wealth eventually merged with the mathematical theory of probability to bring it about. Desrosières himself is the kind of hybrid that perhaps only the French system of education, with its emphasis on philosophy and mathematics, could have produced. He is Administrateur at the Institut National de la Statistique et des Etudes Economiques (INSEE) in Paris, i.e. a practitioner of the arcana of government statistics, as well as the author of several historical and sociological studies analysing the conceptual and political preconditions for doing government statistics.

Desrosières has no intention of debunking the tools of his trade. Rather, he wants to understand what he calls the ‘paradox’ of things that are simultaneously real and conventional. He is rightly puzzled by (but also admiring of) the techniques of categorising and manipulating data that have overcome their often bizarre and controversial origins to become reliable and unassailable. His history is honestly presented as mostly a synthesis of the work of those scholars – Ian Hacking, Marie-Noëlle Bourguet, Theodore Porter, Stephen Stigler, Mary Morgan and others – who have, since the 1980s, richly documented the development of probability and statistics. What is original about Desrosières is that he narrates the story from the point of view of a statistician-cum-metaphysician.

Although his history has an 18th-century prologue and a post-World War Two epilogue, the real action takes place c.1835-1935, when governments in Europe and the US established official statistical bureaux, when there was quantitative data-gathering on an unprecedented scale, when social reformers looked to statistics to diagnose and even cure the ills of industrial cities, and when descriptive statistics met and eventually married mathematical probability. It is this long and hesitant courtship and the subsequent careers of its offspring that supply Desrosières’s main plot.

The life and works of the Belgian statistician Adolphe Quetelet (1796-1874) may serve as an epitome. After studying mathematics, Quetelet persuaded the Belgian Government to send him to Paris to study astronomy with Pierre-Simon de Laplace, the acknowledged doyen of the discipline. There he imbibed not only Laplace’s celestial mechanics but also his treatises on probability theory and his enthusiasm for criminal statistics – which the French Government had begun to collect systematically in the 1820s. When Quetelet returned to Brussels he was rewarded with the directorship of the new observatory there, but after the succès fou of Sur l’homme et le développement de ses facultés, ou Essai d’une physique sociale (1835), he devoted his career to gathering, interpreting and organising – indefatigably and internationally – social statistics.

Quetelet was not only the spiritual forefather of government statisticians; his attempts to apply probability mathematics to data on nearly everything – the ratio of male-female births, the annual suicide rate, the height of military recruits, even literary genius – furnish some of the earliest and oddest examples of the way statistical entities become real (and, in this case, later slide back into unreality).

Quetelet and his contemporaries were forcibly struck by the apparent regularity of statistics – demographic, criminal, medical – gathered for large populations. Eighteenth-century writers like John Arbuthnot and Johann Süssmilch had chalked up the remarkable stability of the ratio of girl-to-boy babies born each year to divine providence, but to the judicial statisticians of the early 19th century it seemed less clear what interest God might have in holding the annual number of murders steady. Quetelet’s explanation involved the constructions of an arithmetical fiction, the ‘average man’, who embodied the regularities that operated at the level of whole societies rather than of individuals. Take the height of military recruits. If you plot height as a function of numbers of men (the Napoleonic practice of mass recruitment and measurement supplied the raw data) you get a rough approximation of an upside-down U-shaped curve: a few are very tall or very short, but most are bunched more or less tightly around some intermediate value. From his astronomical training Quetelet recognised the curve as similar to the one used to measure error in astronomical observations: Laplace and other mathematicians had reasonably assumed that, for a set of observations made on a heavenly body, a few would be way below the true value and a few way above, but the true value would also be the most probable observation. Quetelet leapt to the conclusion that nature had been aiming at a true or ideal height (perhaps the classical proportions of the Apollo Belvedere) in the case of the military recruits, symmetrically distributing her errors along the same normal curve that described the observational errors of the astronomers. A picaresque novel could be written about the fortunes of the normal curve thereafter, as it wandered from astronomy to social statistics to physics (the staistical theory of gases) to psychology (the measurement of intelligence) on the strength of ever stranger analogies.

But Desrosières fixes his attention on Quetelet’s technique of taking averages, which linked the objective mean (lots of measurements taken of the same object) with the subjective mean (the central tendency of measurements taken of many objects as inferred from the hump in the normal curve), for both moral and physical traits. Quetelet’s contemporaries were fascinated and appalled by the implications of his averages for human free will; what interests Desrosières, however, is how these aggregates became real, and immune (at least briefly) from the charge of mixing up apples and pears, i.e. of ignoring critical individual differences. Quetelet’s averages were artificial natural kinds, a pattern that Desrosières sees repeated over and over again in the history of statistics: what he calls ‘classes of equivalence’ must somehow be established, the members of which statisticians are henceforth licensed to treat as identical arithmetical units. This always entails some ‘sacrifice’ of fine-grained local information, and hence often encounters stiff opposition from specialists accustomed to assay every detail with a jeweler’s balance: doctors who balk at neglecting the clinical peculiarities of a case in favour of statistically evaluated therapies; criminologists who resist national statistics because they suspect that the conditions that foster malfeasance in Corsica are not the same as those in Paris; academic social scientists who are indignant when their own opinions and publications are subjected to the same quantitative treatment they routinely apply to the subjects of their surveys.

The conditions of equivalence can be political as well as conceptual. Even an exercise as apparently straightforward as conducting a national census presupposes classes of equivalence. When, for example, the first US Census, mandated by the Constitution at ten-year intervals to apportion seats among the States in the House of Representatives, was conducted in 1790, slaves were counted as three-fifths of a free man. In Ancien Régime France it was not obvious that a national census should equate members of the first, second and third estates as identical units when calculating the national population. The inevitability of establishing classes of equivalence is deeply embedded in social and political traditions, as Desrosières makes vivid by comparing the development of government statistics in Germany, France, Britain and the US.

Desrosières’s most interesting chapters are those on the mathematical techniques of coding, sampling, correlation and modelling, techniques that are now the hallmark of every modern statistical undertaking and of which Desrosières has hands-on experience. Coding – the assignment of an individual case to a class of equivalence – is usually treated as a mechanical task, farmed out to unskilled, badly-paid minions in statistical offices. But Desrosières reveals its complexities and importance: this is the juncture at which the critical work of abstraction that anchors statistical classes to the world must be done, over and over again. When physicians today register causes of death according to the scheme devised by the World Health Organisation, they willy-nilly rehearse a controversial history that opposed aetiology to symptoms. Knowledge of the initial cause was the more useful epidemiological criterion, but it was easier for doctors simply to record the nature and site of symptoms. For over a century, from the first International Congress of Statistics organised in Brussels at Quetelet’s instance in 1853 to WHO’s assumption of responsibility for the International Classification of Diseases in 1955, statisticians and physicians have wrangled over this choice. Although the principle of coding by initial cause eventually carried the day, critics continued to warn that medical progress had and could again render aetiologies obsolete, thus jeopardising the comparability of data over time.

The quandaries of coding did not end with the adoption of the protocol of immediate, intermediary and initial causes. Picture the predicament of the physician called to the scene of a fatal motorcycle accident: the immediate cause of death may be, unambiguously, a massive brain haemorrhage, but intermediary and initial causes potentially leave lots of latitude: An oncoming truck? Failure to wear a helmet? The rashness of youth? These taxonomic puzzles may have legal as well as medical and epidemiological consequences: consider the possibilities for the intermediary and initial causes of a death attributed to the immediate cause of lung cancer. There is nothing automatic about the encoding that sustains statistical classes of equivalence. Yet despite the arguments that raged over their creation and the perplexities that often attend their application, neither classes nor coding practices come apart; they are ‘things that hold’.

Desrosières draws similar lessons from the postponed success story of sampling techniques in statistics. Laplace had already in the late 18th century suggested sampling methods to estimate the French population, but government statistical offices had been wary: how do we know, they argued, that the conditions which prevail for the sample can be safely generalised? What if the phenomenon under study – rates of birth, marriage, suicide – is not homogeneously distributed throughout the population? Once again, experts were reluctant to abandon detailed, local knowledge for the smooth amalgamations presupposed by statistical techniques. The most professional statisticians of the 19th century prided themselves on conducting censuses by complete enumeration. Desrosières links the slow rise of sampling methods in the first decades of the 20th century not only to mathematical advances but also to the replacement of local polities and markets with national ones. Methods of representative sampling, purposive selection and stratification gained respectability among government statisticians because key phenomena such as poverty were no longer conceived of as having local causes and cures: the part – carefully selected, to be sure – could now plausibly stand for the whole. As prominent statisticians themselves recognised, their results could only be obtained from and used for a population that trusted the reliability and veracity of statistical methods. Sampling, with its associations of partiality in every sense of the word, was perhaps the most difficult technique to sell to a public persuaded of its own diversity.

This book is not always easy going. The chronology zigzags, national comparisons are uneven and the explanation of technical points condensed or non-existent, which is especially regrettable, given that Desrosières is so well qualified by training and turn of mind to make these things intuitively intelligible and that techniques are in many ways the protagonists of his story. But it’s well worth persevering through the occasional repetitions and obscurities, for the sake of the panoramic view he provides of one of the most profound, influential and largely invisible transformations of modern thought and the modern state. Statistics can make and break arguments, regimes, even realities as sturdy as wealth and poverty. No amount of muttering about how data can be massaged and inferences distorted has subverted the power of statistics in the modern polity. As an insider, Desrosières is all too aware of the fragility of statistical categories and the contingencies of statistical techniques. There is nothing inevitable about either, as his book shows in considerable detail. But he is not an apostate. Statistics works in and on the world, simultaneously describing and remaking. It straddles the chasm between the invented and the discovered, the real and the constructed – oppositions that have structured an increasingly sterile debate about the nature of science among historians, philosophers, sociologists and scientists. The great merit of Desrosières’s study is that it points the way beyond this impasse by showing how statistical entities are simultaneously real and constructed, invented and discovered.

Desrosières himself likes to appeal to the precedent of the 14th-century schoolmen who claimed that universals abstracted by the mind were at least as real as the particulars of experience. But this analogy does not do full justice to the ontological vigour of statistical entities, which live up to Marx’s programme for philosophy: they do not just describe the world, they change it. The most creative metaphysicians these days may not be philosophers, poets or physicists but government statisticians.