by Ted Goertzel
Adapted and condensed from: Ted Goertzel and Joseph Fashing, "The
Myth of the Normal Curve: A Theoretical Critique and Examination
of its Role in Teaching and Research," Humanity and Society 5:14-31
(1981), reprinted in Readings in Humanist Sociology (General Hall,
1986).
Surely the hallowed bell-shaped curve has cracked from top to
bottom. Perhaps, like the Liberty Bell, it should be enshrined
somewhere as a memorial to more heroic days.
-Earnest Ernest,
Philadelphia
Inquirer. 10 November 1974.
The myth of the bell curve has occupied a central place in the
theory of inequality (Walker, 1929; Bradley, 1968). Apologists
for inequality in all spheres of social life have used the theory
of the bell curve, explicitly and implicitly, in developing moral
rationalizations to justify the status quo. While the misuse of the
bell curve has perhaps been most frequent in the field of
education, it is also common in other areas of social science and
social welfare. When Abraham de Moivre made the first recorded
discovery of the normal curve of error (to give the bell curve its
proper name) in 1733, his immediate concern was with games of
chance. The normal distribution, which is nothing more than the
limiting case of the binomial distribution resulting from random
operations such as flipping coins or rolling dice, was a natural
discovery for anyone interested in the mathematics of gambling. De
Moivre was unhappy, however, with the lowly origins of his
discovery, He proceeded to raise its status by attributing to it an
-importance beyond its literal meaning. In his age, this could best
be done by claiming hat it was a proof of the existence of God. He
announced:
And thus in all cases it will be found, that although Chance
produces irregularities, still the Odds will be infinitely
great, that in process of Time, those irregularities will bear
no proportion to the recurrency of that Order which naturally
results from Original Design .... (Walker, 1929:17).
De Moivre's discovery of the bell curve did not attract much
attention. Gamblers are perhaps better served with discrete
distributions. Theologians, for their part, no doubt preferred to
base their case for God's insistence on less probabilistic grounds.
Serious interest in the distribution of errors on the part of
mathematicians such as Laplace and Gauss awaited the early
nineteenth century when astronomers found the bell curve to be a
useful tool to take into consideration the errors they made in
their observations of the orbits of the planets.
Further developments in the myth of the bell curve were left
not to the astronomers or theologians but to the early quantitative
social scientists. Systematic collection of population statistics
began in the late eighteenth and early nineteenth centuries as a
response to the social upheavals of the time and the consequent
concern with understanding the dynamics of mass behavior. These
early sociologists were not concerned with theology, but they were
seeking proof of the orderliness of society. Relying on the
justifiably great prestige of Laplace and Gauss as mathematicians,
they took the bell curve as proof of the existence of order in the
seemingly chaotic social world. Unfortunately, the early
social scientists often had a poor understanding of the fact that
the mathematical formulas of Gauss and Laplace were based on
assumptions not often met in the empirical world. As Fisher (1923,
Vol. 1: 18 1) points out:
the Gaussian error law came to act as a veritable
Procrustean bed to which all possible measurements
should be made to fit. The belief in authority so typical of
modern German learning and which has also spread to America
was too great to question the supposed generality of the law
discovered by the great Gauss.
The mathematicians, on the other hand, did not feel that it
was their domain to check whether or not the empirical world
happened to fit their postulates. The bell curve came to be
generally accepted, as M. Lippmnan remarked to Poincare (Bradley,
1969:8), because "...the experimenters fancy that it is a theorem
in mathematics and the mathematicians that it is an experimental
fact."
Adolph Quetelet, the father of quantitative social science,
was the first to claim that the bell curve could be applied only
to random errors but also to the distributions of social phenomena
(Landau and Lazarsfeld, 1968; Wechsler, 1935:30-31). The myth of
the bell curve was part of Quetelet's theory of the Average Man
(Quetelet, 1969). He assumed that nature aimed at a fixed point in
forming human beings, but made a certain frequency of errors. The
mean in any distribution of human phenomena was to him not merely
a descriptive tool but a statement of the ideal. Extremes in all
things were undesirable deviations. His doctrine was a
quantification of Aristotle's doctrine of the Golden Mean, and it
is susceptible to the same criticisms. While there may be traits
where the average can reasonably be considered to be the ideal, the
argument's application is severely limited. One might argue, for
example, that average vision is ideal, whereas nearsightedness and
farsightedness are undesirable deviations. But is this true of
physical strength or of mental abilities, or even of physical
stature (one variable for which there is actually substantial
evidence of an approximately normal distribution)? Quetelet, like
Aristotle, exempted mental abilities, arguing that those who were
superior to the average in intelligence were mere forerunners of a
new average that was to come.
Quetelet's doctrine of the Average Man was ill suited to a
society that was more in need of a rationalization for inequality
than a glorification of the common man. His use of the bell curve,
however, was useful as part of the social Darwinist ideology that
was emerging as a justification for the inequities of laissez-faire
capitalism.
The myth of the bell curve found its most enthusiastic and
effective champion in Francis Galton and the eugenics movement of
which he was a major founder. The importance that he attributed to
the bell curve can be illustrated by the following quotation
(Galton, 1889:66):
I know of scarcely anything so apt to impress the imagination
as the wonderful form of cosmic order expressed by the "Law of
Frequency of Error." The law would have been personified by
the Greeks and deified, if they had known of it. It reigns
with serenity and in complete self-effacement amidst the
wildest confusion. The huger the mob, the greater the apparent
anarchy, the more perfect is its sway. It is the supreme law
of Unreason. Whenever a large sample of chaotic elements are
taken in hand and marshalled in the order of their magnitude,
an unsuspected and most beautiful form of regularity proves to
have been latent all along. The tops of the marshalled row
form a flowing curve of invariable proportions; and each
element, as it is sorted into place, finds, as it were, a
preordained niche, accurately adapted to fit it.
Galton went beyond Quetelet not only in his enthusiasm
for the bell curve but also in his attempt to gather data to
demonstrate its general applicability. He obtained data on a
number of physical traits that he was interested in improving, such
as height, weight, strength of the arms and of the grip, swiftness
of the blow, and keenness of eyesight. The variables tended to be
approximately normally distributed, but the fit was not perfect.
He consequently converted his data into a type of standard score
and averaged the standard scores together (Galton, 1889:201).
These average scores fit the fit the normal curve very well as
might be expected since he had averaged together a number of
largely unrelated variables and created a mean score that reflected
little more than random error.
Karl Pearson (best known today for the invention of the
product-moment correlation coefficient) was Galton Professor of
Eugenics at the University of London and Galton's biographer. He
accepted the ideology of the eugenics movement and was preoccupied
with curing social problem by creating a race of superior blue-eyed
and golden-haired people (Pearson, 1912). He was, however, too
good a statistician to repeat Galton's methodological errors or to
accept the Gaussian model on the basis of authority. He used his
newly developed Chi Square test to check how closely a number of
empirical distributions of supposedly random errors fitted the bell
curve. He found that many of the distributions that had been cited
in the literature as fitting the normal curve were actually
significantly different from it, and concluded that "the normal
curve of error possesses no special fitness for describing errors
or deviations such as arise either in observing practice or in
nature" (Pearson, 1900: 174).
The Myth in Testing Theory
Pearson's conclusions were not sufficient to stop the
application of the normal curve of error as a norm in assigning
classroom grades or in psychological testing. Most objective tests
that are in practical use today rely on summated scaling
techniques. This means that the person taking the tests answers a
large number of items and receives a total score corresponding to
the number of items that he or she answers correctly. This type of
measurement, which is also used in Likert-scaling in sociological
research, has an inherent bias toward the normal distribution in
that it is essentially an averaging process, and the central limit
theorem shows that distributions of means tend to be normally
distributed even if the underlying distribution is not (if the
means are based on large random samples). This inherent
bias is most likely to be realized if the responses to the test
items are poorly intercorrelated (i.e., if the test or scale is
poorly constructed to measure a central factor).
If a large number of people fill out a typical multiple choice
test such as the Scholastic Aptitude Test (or a typical
sociological questionnaire with precoded responses such as
"strongly agree, agree") at random using a perfect die, the scores
are very likely to be normally distributed. This is true because
many more combinations of responses give a sum that is close to the
theoretical mean than give a score that is close to either extreme.
This characteristic of the averaging process is useful in
calculating probable errors in random sampling and is consequently
discussed in elementary statistics books (e.g., Blalock,
1960:138-141). When averaging is used in testing or measurement,
however, it means that the greater the amount of error present, the
greater the likelihood of a normal distribution of scores, even if
the variable being measured is not normally distributed.
All objective tests contain a certain amount of error in that
the chance of a respondent's getting a given item right depends not
only on the central factor being measured but also on other general
factors and on characteristics idiosyncratic to that item (not to
mention the element of luck). Thus it is not surprising that
summated scaling devices tend to give normal distributions. The
problem comes when this tendency is interpreted not as a result of
unavoidable error, but as a confirmation of a preconceived idea
that the variable being measured is in fact normally distributed.
The early developers of standardized intelligence tests were
pleased to find that their distributions of scores were
approximately normal, although they were disturbed by the fact that
perfect normal distributions were rarely, if ever, achieved.
Tborndike (1926:521-555) went so far as to average together scores
achieved by the same respondents on eleven different intelligence
tests in order to achieve a more normal distribution. He thus
repeated Galton's mistake by averaging together somewhat diverse
measures and then assuming that the resultant distribution was due
to the normality of the underlying variable rather than to the
increased measurement error. (The importance of this, of course,
depends on how different the various tests were.) He also
discounted the fact that the intelligence tests themselves
were standardized in such a way as to give normal distribution.
Despite the efforts of prominent psychometricians such as
David Wechsler (1935:34) to counter it, the myth of the bell curve
was widely disseminated in psychological texts (Goodenough,
1949:148-149; V , 1940-16-17; Anastasi, 1968:27) and is widely used
as a criterion for test construction. More modern texts usually
recognize that there is no theoretical justification for the use of
the normal curve, but justify using it as a convenience (Cronbach,
1970:99-100).
The clear assertion by prominent psychologists such as
Wechsler and Cronbach that psychological phenomena are not somehow
inherently normally distributed is a clear advance over the type of
indoctrination that students of educational psychology typically
received in the 1930s and 1940s. This methodological advance
coincided with a general trend in the social sciences away from
sociobiological arguments. The close tie between methodological
presuppositions and ideological concerns is illustrated by the fact
that the myth of the bell curve has recently been reactivated
precisely as part of an attempt to reassert racist arguments about
the biological determinants of human abilities. In his highly
controversial article on genetics and I.Q., Arthur Jensen (1969)
went to considerable length in an attempt to demonstrate that I.Q.
scores are approximately normally distributed.
In 1994, Richard Herrnstein and Charles Murray used the phrase
"The Bell Curve" as the title of their widely reviewed book on
Intelligence and Class Structure in American Life. While their
book presents elaborate statistical justifications for most of its
assertions, however, the claim that intelligence is normally
distributed is defended on common sense grounds. Herrnstein and
Murray (1994: 557) simply assert that "it makes sense that most
things will be arranged in bell-shaped curves. Extremes tend to be
rarer than averages." They note that the bell curve "has a close
mathematical affinity to the meaning of the standard deviation," a
concept which they use extensively in the book, and remark that:
It is worth pausing a moment over this link between a
relatively simple measure of spread in a distribution and the
way things in everyday life vary, for it is one of nature's
more remarkable uniformities.
In reality, there is nothing remarkable about the fact that
measures which contain a good deal of random variation will fit a
measure designed to measure random variation.
The question whether intelligence is or is not normally
distributed is actually irrelevant to the thesis that observed
differences in I.Q. scores between racial groups reflect innate
biologic differences. Jenson, Herrnstein and Murray apparently
introduce the topic of the normality of I.Q. score distributions
because readers who have been led to accept the myth of the normal
curve in other contexts may assume that a normal distribution
proves that the measurement was valid. If the normal distribution
were properly understood as nothing more than a distribution of
random errors, it would not lend any weight to their arguments.
tests.
The Myth of the Bell Curve in Grading
The myth of the normal bell curve also lives on in educational
institutions, where students and faculty often casually refer to
"grading on the curve" or "curving the grades." Many
administrators resemble the superintendent of schools in "Elmtown"
(Hollingshead, 1961) in assuming that a normal distribution of
scores indicates that a good job of grading was done. Often,
instructors are expected to turn in an approximately normal distri-
bution of grades and any substantial deviations must be justified.
In a 1970-1972 dispute at a large state university, conflict over
grading and other issues led to a situation in which all but one of
the full-time junior faculty members were fired, denied tenure, or
resigned under pressure (Goertzel and Fashing, 1969).
The initial controversy arose when some administrators became
concerned about the tendency toward "grade inflation" on campus, an
issue that has been of some national concern as well (Jencks and
Riesman, 1968). The dean of the college distributed statistics
showing that the mean grade point average had been increasing over
time and in comparison to other institutions. There was also
considerable difference in the average grades given out by
departments on campus. The Sociology Department was particularly
singled out for its high average grades, and pressure was put on
the department chair to bring his faculty members into line.
One junior faculty member was told that he must use "common
sense" standards in grading that would result in a "more or less
normal distribution" of grades. The teaching assistants in the
chairman's introductory sociology class were given more explicit
instructions: The combined average grades for each of their four
classes was not to exceed 2.6 (or a low B -). Five teaching
assistants were summarily dismissed after they refused to sign a
document declaring their willingness to carry out the intent of the
chairman's directive.
The issue became a major focus of conflict on campus, leading
the dean and other senior faculty and administrators to enunciate
assumptions which are not often states so clearly. They made it
clear that their concern went beyond the question of the "average"
or mean grade. They were also concerned that the number of As be
relatively small. Indeed, they insisted that the usual distribution
of grades should approximate a normal distribution in that most
grades should be clustered around the mean (or C) with relatively
few at the extremes. Most of the spokesmen who supported a normal
distribution said they thought that such a distribution was the
"usual," "natural" or "common sense" result to be obtained from
correct grading procedures.
In a more traditional view of grading as representing
objective academic standards, instructors should grade papers
according to their intrinsic merit and give out whatever grades
result even if the distribution results in a lot of A's or F's. On
tests, an instructor should know, before looking at the results,
what score will be required for each grade. This practice,
however, may be administratively inconvenient for several reasons.
Enrollments may drop if too many students fail. Admissions to
elite programs may be too large if too many students receive high
grades. The myth of the bell curve serves administrative
convenience by assuring that a predictable proportion of students
can be channeled into each strata of the educational and
occupational system.
The Bell curve in Theory and Research
The use of the myth of the bell curve in research serves to
reinforce some persistent biases, as well as to disguise sloppy
research practices. These biased research findings may then be used
to justify the assumption that abilities and talents are normally
distributed and that grades and other social rewards should be
distributed according to the bell curve.
The assumption that social phenomena should be normally
distributed is consistent with pluralist or other multicausal
theoretical models, since a large number of unrelated and
equipotent causes lead to a normal distribution. Indeed, the early
pluralists in political science expected political attitudes to be
normally distributed, since they believed them to be caused by
numerous, equipotent independent factors (Rice, 1928:72).
Similarly, if social status is determined by a number of
independent factors, we would expect it to be normally distributed.
If, as Marxists and others argue, it is largely determined by a
single variable, such as the relationship to the means of
production, there would be no reason for this to be the case.
In point of fact, income is not normally distributed in the
United States or any other known society. Income can be measured
easily in monetary units, this is well accepted. A graph of the
income distribution in the United States can even be found in
Herrnstein and Murray's book (1984: 100), and it is not a bell
curve. Other measurements used by social scientists, however,
provide only a rough index of the underlying trait. If sufficient
error is present in these measuring instruments, a normal
distribution may well result.
Lundberg and Friedman (1943), for example, compared three
measures of socioeconomic status in a rural community. These tests
measured social status by arbitrarily assigning points to the
furniture and other objects observed in the respondents' living
rooms. After applying several tests to the same families and
plotting the resulting distributions, the authors noted:
assuming that in a random sample, socioeconomic status is
normally distributed, the distortion of the normality of the
distribution by the Guttman version of the Chapin scale
suggests the presence of spurious factors ....
In other words, the bell curve was used as a standard for
deciding which test was valid.
The commentators on the article (Knupfer and Merton,
1943) were quick to point out that this was an unjustified
assumption. Income, property, education, and occupational status
are not normally distributed; why should socioeconomic status as
measured by a summated scale of the paraphernalia in the
respondents' living rooms be? Yet the assumption that distribution
should be normal is widely used, perhaps in the absence of any
other criterion to demonstrate that a good job of measurement has
been done. A U.S. Forest Service Report (1973:24a), for example,
reports with satisfaction that scores on an index of the wilderness
quality of roadless areas were quite normally distributed. There
is no reason why this should be the case except that the Forest
Service has averaged together a number of possibly unrelated
variables (scenic character, isolation, variety). (In
fact, distribution found by the Forest Service deviates
significantly from normality; but, as if often the case, they did
not check the goodness of fit.) The use of normality as a
criterion reinforces sloppiness in scale construction, since a
sloppy scale has more error and is thus more likely to approximate
a normal distribution.
The myth of the bell curve is also consistent with theories
that assume that social behavior is a reflection of individual
differences (provided, also, that it is assumed that individual
differences are normally distributed). Stuart Dodd (1942:251-262),
for example, used the bell curve in developing his theory of social
problems. A social problem, to Dodd, consisted in a deficit of
some characteristic that is socially desirable. The 2% of the
population that falls below two standard deviations from the mean
on a desirable characteristic are the "minimals," and they
constitute the social problems. These "minimals" include
divorcees, prostitutes, illegitimates; the sick, blind, crippled,
or insane; the poor and unemployed; criminals and political
refugees; inferior races such as Bushmen and Pygmies; the
illiterate or ignorant; the overworked and underprivileged; the
offensively vulgar; atheists; foreign language minorities; hermits
and social isolates.
Dodd was certainly aware that not all phenomena are normally
distributed, and he realized that the two percent figure may not
always be appropriate. Yet, only the assumption of normality led
him to even suggest this figure; otherwise, what possible reason
could there be for suggesting that the divorce rate, poverty rate,
unemployment rate, to say nothing of the proportion of foreign
language minorities, should fall at 2%?
Dodd also used the bell curve to estimate the possible range
of human characteristics, determining that it was unlikely for the
range to exceed 12.5 standard deviations (Dodd, 1942:261-262). He
noted, however, that the range of incomes in our "capitalistic
culture" exceeded 2000 standard deviations. His suggestion that the
variance in incomes should be limited to correspond to the variance
in abilities is perhaps a good one, but more rigorous data show
that the assumption of normality cannot be used m determining the
range of these abilities. Weschler (1935) shows on the basis of
much better data, that the range of human traits rarely exceeds a
ratio of 3:1 (the range ratio of Binet Mental Age scores is
2.30:1).
Nothing in this paper should be taken as questioning the use
of the normal distribution where it is appropriate (e.g., in
estimating confidence intervals from random samples). To make
this correct usage clear, it might be wise to revert to the
earlier phrase, "normal curve of error." This would make it clear
that the normal bell curve is "normal" only if we are dealing
with random errors. Social life, however, is not a lottery, and
there is no reason to expect sociological variables to be nor-
mally distributed. Nor is there any reason to expect psycho-
logical variables to be if they are influenced by social factors.
Certain physiological traits, such as length of the extremities,
are often approximately normally distributed within homogeneous
populations. Other traits, such as weight, which are affected by
social behaviors, are not. Indeed, if a phenomenon is found to be
normally distributed, this is very likely an indication that it
is caused by random individual variations rather than by social
forces.
The myth that social variables are normally distributed has
been shown to be invalid by those methodologists who have taken
the trouble to check it out. Its persistence in the folklore and
procedures of social institutions is a reflection of
institutionalized bias, not
scientific
rigor.
References
Anastasi, A.
1968 Psychological Testing. New York: Macmillan.
Blalock, H.
1960 Social Statistics. New York: McGraw-Hill.
Bohrnstedt, E. and C. Bohrnstedt
1972 'How One Normally Constructs Good Measures,
Sociological Methods and Research, I, 3-12.
Bradley, J.V.
1968 Distribution-free Statistical Tests. Englewood Cliffs,
N.J.: Prentice-Hall.
Cronbach, L.
1970 Essentials of Psychological Testing. New York:
Harper & Row.
Dodd, S.
1942 Dimensions of Society. New York: Macmillan.
Fisher, A.
1922 The Mathematical Theory of Probability. New
York: Macmillan.
Forest Service, U.S.D.A.
Roadless and Undeveloped Areas Within National Forests.
Springfield Va.: National Technical Information Service.
Galton, F.
1889 Natural Inheritance. London: Macmillan.
Goertzel, T. and J. Fashing
1981 "The Myth of the Normal Curve: A Theoretical Critique
and Examination of its Role in Teaching and Research"
Humanity and Society 5: 14-31.
Goodenough, F.
1949 Mental Testing. New York: Rinehart.
Herrnstein, R. and C. Murray
1994 The Bell Curve: Intelligence and Class Structure in
American Life. New York: Free Press.
Hollingshead, A.
1961 Eltmtown's Youth. New York: Wiley.
Hoyt, D.P.
1965 "The Relationship Between College Grades and Adult
Achievement." Iowa City: American College Testing Program,
Research Report No. 7.
Jencks, C. and D. Riesman
1968 The Academic Revolution. New York: Doubleday.
Jencks, C., et al.
1972 Inequality. New York: Basic Books.
Jensen, A.
1969 "How Much Can We Boost I.Q. and Scholastic Achievement?"
Harvard Educational Review 39, 1-123.
Knupfer, G. and R. Merton
1943 "Discussion." Rural Sociology 8, 236-239.
Landau, D. and P.F. Lazarsfeld
1968 "Adolphe Quetelet." In Vol. 13 of International
Encyclopedia of the Social Sciences. New York:
Macmillan and Free Press.
Lundberg, G. and P. Friedman
1943 "A Comparison of Three Measures of SocioEconomic Status."
Rural Sociology 8, 227-236.
Pearson, K.
1912 Social Problems: Their Treatment, Past, Present and
Future. London: Dulau.
1900 "On the Criterion That a Given System of Deviations From
the Probable in the Case of a Correlated System of
Variables Is Such That It Can Be Reasonably Supposed to
Have Arisen from Random Sampling." The London, Edinburgh
and Dublin Philosophical Magazine and Journal of Science
50, 157-175.
Quetelet, L.A.J.
1969 A Treatise on Man. Gainesville, Fla.: Scholar's
Facsimiles and Reprints. Rice, S.
1928 Quantitative Methods in Politics. New York: Knopf.
Thorndike, E.L., el al.
1927 The Measurement of Intelligence. New York: Columbia
University Press.
Thurstone, L.L.
1959 The Vectors of the Mind. Chicago: University of
Chicago Press.
Vernon, P.
1940 The Measurement of Abilities. London: University of
London Press.
Walker, H.
1929 Studies in the History of Statistical Method.
Baltimore: Williams and Wilkins.
Wechsler, D.
1935 The Range of Human Abilities. Baltimore: Williams and
Wilkins.