The Myth of the Bell Curve

by Ted Goertzel

Adapted and condensed from: Ted Goertzel and Joseph Fashing, "The

Myth of the Normal Curve: A Theoretical Critique and Examination

of its Role in Teaching and Research," Humanity and Society 5:14-31

(1981), reprinted in Readings in Humanist Sociology (General Hall,

1986).

Surely the hallowed bell-shaped curve has cracked from top to

bottom. Perhaps, like the Liberty Bell, it should be enshrined

somewhere as a memorial to more heroic days.

-Earnest Ernest, Philadelphia Inquirer. 10 November 1974.

The myth of the bell curve has occupied a central place in the

theory of inequality (Walker, 1929; Bradley, 1968). Apologists

for inequality in all spheres of social life have used the theory

of the bell curve, explicitly and implicitly, in developing moral

rationalizations to justify the status quo. While the misuse of the

bell curve has perhaps been most frequent in the field of

education, it is also common in other areas of social science and

social welfare. When Abraham de Moivre made the first recorded

discovery of the normal curve of error (to give the bell curve its

proper name) in 1733, his immediate concern was with games of

chance. The normal distribution, which is nothing more than the

limiting case of the binomial distribution resulting from random

operations such as flipping coins or rolling dice, was a natural

discovery for anyone interested in the mathematics of gambling. De

Moivre was unhappy, however, with the lowly origins of his

discovery, He proceeded to raise its status by attributing to it an

-importance beyond its literal meaning. In his age, this could best

be done by claiming hat it was a proof of the existence of God. He

announced:

And thus in all cases it will be found, that although Chance

produces irregularities, still the Odds will be infinitely

great, that in process of Time, those irregularities will bear

no proportion to the recurrency of that Order which naturally

results from Original Design .... (Walker, 1929:17).

De Moivre's discovery of the bell curve did not attract much

attention. Gamblers are perhaps better served with discrete

distributions. Theologians, for their part, no doubt preferred to

base their case for God's insistence on less probabilistic grounds.

Serious interest in the distribution of errors on the part of

mathematicians such as Laplace and Gauss awaited the early

nineteenth century when astronomers found the bell curve to be a

useful tool to take into consideration the errors they made in

their observations of the orbits of the planets.

Further developments in the myth of the bell curve were left

not to the astronomers or theologians but to the early quantitative

social scientists. Systematic collection of population statistics

began in the late eighteenth and early nineteenth centuries as a

response to the social upheavals of the time and the consequent

concern with understanding the dynamics of mass behavior. These

early sociologists were not concerned with theology, but they were

seeking proof of the orderliness of society. Relying on the

justifiably great prestige of Laplace and Gauss as mathematicians,

they took the bell curve as proof of the existence of order in the

seemingly chaotic social world. Unfortunately, the early

social scientists often had a poor understanding of the fact that

the mathematical formulas of Gauss and Laplace were based on

assumptions not often met in the empirical world. As Fisher (1923,

Vol. 1: 18 1) points out:

the Gaussian error law came to act as a veritable

Procrustean bed to which all possible measurements

should be made to fit. The belief in authority so typical of

modern German learning and which has also spread to America

was too great to question the supposed generality of the law

discovered by the great Gauss.

The mathematicians, on the other hand, did not feel that it

was their domain to check whether or not the empirical world

happened to fit their postulates. The bell curve came to be

generally accepted, as M. Lippmnan remarked to Poincare (Bradley,

1969:8), because "...the experimenters fancy that it is a theorem

in mathematics and the mathematicians that it is an experimental

fact."

Adolph Quetelet, the father of quantitative social science,

was the first to claim that the bell curve could be applied only

to random errors but also to the distributions of social phenomena

(Landau and Lazarsfeld, 1968; Wechsler, 1935:30-31). The myth of

the bell curve was part of Quetelet's theory of the Average Man

(Quetelet, 1969). He assumed that nature aimed at a fixed point in

forming human beings, but made a certain frequency of errors. The

mean in any distribution of human phenomena was to him not merely

a descriptive tool but a statement of the ideal. Extremes in all

things were undesirable deviations. His doctrine was a

quantification of Aristotle's doctrine of the Golden Mean, and it

is susceptible to the same criticisms. While there may be traits

where the average can reasonably be considered to be the ideal, the

argument's application is severely limited. One might argue, for

example, that average vision is ideal, whereas nearsightedness and

farsightedness are undesirable deviations. But is this true of

physical strength or of mental abilities, or even of physical

stature (one variable for which there is actually substantial

evidence of an approximately normal distribution)? Quetelet, like

Aristotle, exempted mental abilities, arguing that those who were

superior to the average in intelligence were mere forerunners of a

new average that was to come.

Quetelet's doctrine of the Average Man was ill suited to a

society that was more in need of a rationalization for inequality

than a glorification of the common man. His use of the bell curve,

however, was useful as part of the social Darwinist ideology that

was emerging as a justification for the inequities of laissez-faire

capitalism.

The myth of the bell curve found its most enthusiastic and

effective champion in Francis Galton and the eugenics movement of

which he was a major founder. The importance that he attributed to

the bell curve can be illustrated by the following quotation

(Galton, 1889:66):

I know of scarcely anything so apt to impress the imagination

as the wonderful form of cosmic order expressed by the "Law of

Frequency of Error." The law would have been personified by

the Greeks and deified, if they had known of it. It reigns

with serenity and in complete self-effacement amidst the

wildest confusion. The huger the mob, the greater the apparent

anarchy, the more perfect is its sway. It is the supreme law

of Unreason. Whenever a large sample of chaotic elements are

taken in hand and marshalled in the order of their magnitude,

an unsuspected and most beautiful form of regularity proves to

have been latent all along. The tops of the marshalled row

form a flowing curve of invariable proportions; and each

element, as it is sorted into place, finds, as it were, a

preordained niche, accurately adapted to fit it.

Galton went beyond Quetelet not only in his enthusiasm

for the bell curve but also in his attempt to gather data to

demonstrate its general applicability. He obtained data on a

number of physical traits that he was interested in improving, such

as height, weight, strength of the arms and of the grip, swiftness

of the blow, and keenness of eyesight. The variables tended to be

approximately normally distributed, but the fit was not perfect.

He consequently converted his data into a type of standard score

and averaged the standard scores together (Galton, 1889:201).

These average scores fit the fit the normal curve very well as

might be expected since he had averaged together a number of

largely unrelated variables and created a mean score that reflected

little more than random error.

Karl Pearson (best known today for the invention of the

product-moment correlation coefficient) was Galton Professor of

Eugenics at the University of London and Galton's biographer. He

accepted the ideology of the eugenics movement and was preoccupied

with curing social problem by creating a race of superior blue-eyed

and golden-haired people (Pearson, 1912). He was, however, too

good a statistician to repeat Galton's methodological errors or to

accept the Gaussian model on the basis of authority. He used his

newly developed Chi Square test to check how closely a number of

empirical distributions of supposedly random errors fitted the bell

curve. He found that many of the distributions that had been cited

in the literature as fitting the normal curve were actually

significantly different from it, and concluded that "the normal

curve of error possesses no special fitness for describing errors

or deviations such as arise either in observing practice or in

nature" (Pearson, 1900: 174).

The Myth in Testing Theory

Pearson's conclusions were not sufficient to stop the

application of the normal curve of error as a norm in assigning

classroom grades or in psychological testing. Most objective tests

that are in practical use today rely on summated scaling

techniques. This means that the person taking the tests answers a

large number of items and receives a total score corresponding to

the number of items that he or she answers correctly. This type of

measurement, which is also used in Likert-scaling in sociological

research, has an inherent bias toward the normal distribution in

that it is essentially an averaging process, and the central limit

theorem shows that distributions of means tend to be normally

distributed even if the underlying distribution is not (if the

means are based on large random samples). This inherent

bias is most likely to be realized if the responses to the test

items are poorly intercorrelated (i.e., if the test or scale is

poorly constructed to measure a central factor).

If a large number of people fill out a typical multiple choice

test such as the Scholastic Aptitude Test (or a typical

sociological questionnaire with precoded responses such as

"strongly agree, agree") at random using a perfect die, the scores

are very likely to be normally distributed. This is true because

many more combinations of responses give a sum that is close to the

theoretical mean than give a score that is close to either extreme.

This characteristic of the averaging process is useful in

calculating probable errors in random sampling and is consequently

discussed in elementary statistics books (e.g., Blalock,

1960:138-141). When averaging is used in testing or measurement,

however, it means that the greater the amount of error present, the

greater the likelihood of a normal distribution of scores, even if

the variable being measured is not normally distributed.

All objective tests contain a certain amount of error in that

the chance of a respondent's getting a given item right depends not

only on the central factor being measured but also on other general

factors and on characteristics idiosyncratic to that item (not to

mention the element of luck). Thus it is not surprising that

summated scaling devices tend to give normal distributions. The

problem comes when this tendency is interpreted not as a result of

unavoidable error, but as a confirmation of a preconceived idea

that the variable being measured is in fact normally distributed.

The early developers of standardized intelligence tests were

pleased to find that their distributions of scores were

approximately normal, although they were disturbed by the fact that

perfect normal distributions were rarely, if ever, achieved.

Tborndike (1926:521-555) went so far as to average together scores

achieved by the same respondents on eleven different intelligence

tests in order to achieve a more normal distribution. He thus

repeated Galton's mistake by averaging together somewhat diverse

measures and then assuming that the resultant distribution was due

to the normality of the underlying variable rather than to the

increased measurement error. (The importance of this, of course,

depends on how different the various tests were.) He also

discounted the fact that the intelligence tests themselves

were standardized in such a way as to give normal distribution.

Despite the efforts of prominent psychometricians such as

David Wechsler (1935:34) to counter it, the myth of the bell curve

was widely disseminated in psychological texts (Goodenough,

1949:148-149; V , 1940-16-17; Anastasi, 1968:27) and is widely used

as a criterion for test construction. More modern texts usually

recognize that there is no theoretical justification for the use of

the normal curve, but justify using it as a convenience (Cronbach,

1970:99-100).

The clear assertion by prominent psychologists such as

Wechsler and Cronbach that psychological phenomena are not somehow

inherently normally distributed is a clear advance over the type of

indoctrination that students of educational psychology typically

received in the 1930s and 1940s. This methodological advance

coincided with a general trend in the social sciences away from

sociobiological arguments. The close tie between methodological

presuppositions and ideological concerns is illustrated by the fact

that the myth of the bell curve has recently been reactivated

precisely as part of an attempt to reassert racist arguments about

the biological determinants of human abilities. In his highly

controversial article on genetics and I.Q., Arthur Jensen (1969)

went to considerable length in an attempt to demonstrate that I.Q.

scores are approximately normally distributed.

In 1994, Richard Herrnstein and Charles Murray used the phrase

"The Bell Curve" as the title of their widely reviewed book on

Intelligence and Class Structure in American Life. While their

book presents elaborate statistical justifications for most of its

assertions, however, the claim that intelligence is normally

distributed is defended on common sense grounds. Herrnstein and

Murray (1994: 557) simply assert that "it makes sense that most

things will be arranged in bell-shaped curves. Extremes tend to be

rarer than averages." They note that the bell curve "has a close

mathematical affinity to the meaning of the standard deviation," a

concept which they use extensively in the book, and remark that:

It is worth pausing a moment over this link between a

relatively simple measure of spread in a distribution and the

way things in everyday life vary, for it is one of nature's

more remarkable uniformities.

In reality, there is nothing remarkable about the fact that

measures which contain a good deal of random variation will fit a

measure designed to measure random variation.

The question whether intelligence is or is not normally

distributed is actually irrelevant to the thesis that observed

differences in I.Q. scores between racial groups reflect innate

biologic differences. Jenson, Herrnstein and Murray apparently

introduce the topic of the normality of I.Q. score distributions

because readers who have been led to accept the myth of the normal

curve in other contexts may assume that a normal distribution

proves that the measurement was valid. If the normal distribution

were properly understood as nothing more than a distribution of

random errors, it would not lend any weight to their arguments.

tests.

The Myth of the Bell Curve in Grading

The myth of the normal bell curve also lives on in educational

institutions, where students and faculty often casually refer to

"grading on the curve" or "curving the grades." Many

administrators resemble the superintendent of schools in "Elmtown"

(Hollingshead, 1961) in assuming that a normal distribution of

scores indicates that a good job of grading was done. Often,

instructors are expected to turn in an approximately normal distri-

bution of grades and any substantial deviations must be justified.

In a 1970-1972 dispute at a large state university, conflict over

grading and other issues led to a situation in which all but one of

the full-time junior faculty members were fired, denied tenure, or

resigned under pressure (Goertzel and Fashing, 1969).

The initial controversy arose when some administrators became

concerned about the tendency toward "grade inflation" on campus, an

issue that has been of some national concern as well (Jencks and

Riesman, 1968). The dean of the college distributed statistics

showing that the mean grade point average had been increasing over

time and in comparison to other institutions. There was also

considerable difference in the average grades given out by

departments on campus. The Sociology Department was particularly

singled out for its high average grades, and pressure was put on

the department chair to bring his faculty members into line.

One junior faculty member was told that he must use "common

sense" standards in grading that would result in a "more or less

normal distribution" of grades. The teaching assistants in the

chairman's introductory sociology class were given more explicit

instructions: The combined average grades for each of their four

classes was not to exceed 2.6 (or a low B -). Five teaching

assistants were summarily dismissed after they refused to sign a

document declaring their willingness to carry out the intent of the

chairman's directive.

The issue became a major focus of conflict on campus, leading

the dean and other senior faculty and administrators to enunciate

assumptions which are not often states so clearly. They made it

clear that their concern went beyond the question of the "average"

or mean grade. They were also concerned that the number of As be

relatively small. Indeed, they insisted that the usual distribution

of grades should approximate a normal distribution in that most

grades should be clustered around the mean (or C) with relatively

few at the extremes. Most of the spokesmen who supported a normal

distribution said they thought that such a distribution was the

"usual," "natural" or "common sense" result to be obtained from

correct grading procedures.

In a more traditional view of grading as representing

objective academic standards, instructors should grade papers

according to their intrinsic merit and give out whatever grades

result even if the distribution results in a lot of A's or F's. On

tests, an instructor should know, before looking at the results,

what score will be required for each grade. This practice,

however, may be administratively inconvenient for several reasons.

Enrollments may drop if too many students fail. Admissions to

elite programs may be too large if too many students receive high

grades. The myth of the bell curve serves administrative

convenience by assuring that a predictable proportion of students

can be channeled into each strata of the educational and

occupational system.

The Bell curve in Theory and Research

The use of the myth of the bell curve in research serves to

reinforce some persistent biases, as well as to disguise sloppy

research practices. These biased research findings may then be used

to justify the assumption that abilities and talents are normally

distributed and that grades and other social rewards should be

distributed according to the bell curve.

The assumption that social phenomena should be normally

distributed is consistent with pluralist or other multicausal

theoretical models, since a large number of unrelated and

equipotent causes lead to a normal distribution. Indeed, the early

pluralists in political science expected political attitudes to be

normally distributed, since they believed them to be caused by

numerous, equipotent independent factors (Rice, 1928:72).

Similarly, if social status is determined by a number of

independent factors, we would expect it to be normally distributed.

If, as Marxists and others argue, it is largely determined by a

single variable, such as the relationship to the means of

production, there would be no reason for this to be the case.

In point of fact, income is not normally distributed in the

United States or any other known society. Income can be measured

easily in monetary units, this is well accepted. A graph of the

income distribution in the United States can even be found in

Herrnstein and Murray's book (1984: 100), and it is not a bell

curve. Other measurements used by social scientists, however,

provide only a rough index of the underlying trait. If sufficient

error is present in these measuring instruments, a normal

distribution may well result.

Lundberg and Friedman (1943), for example, compared three

measures of socioeconomic status in a rural community. These tests

measured social status by arbitrarily assigning points to the

furniture and other objects observed in the respondents' living

rooms. After applying several tests to the same families and

plotting the resulting distributions, the authors noted:

assuming that in a random sample, socioeconomic status is

normally distributed, the distortion of the normality of the

distribution by the Guttman version of the Chapin scale

suggests the presence of spurious factors ....

In other words, the bell curve was used as a standard for

deciding which test was valid.

The commentators on the article (Knupfer and Merton,

1943) were quick to point out that this was an unjustified

assumption. Income, property, education, and occupational status

are not normally distributed; why should socioeconomic status as

measured by a summated scale of the paraphernalia in the

respondents' living rooms be? Yet the assumption that distribution

should be normal is widely used, perhaps in the absence of any

other criterion to demonstrate that a good job of measurement has

been done. A U.S. Forest Service Report (1973:24a), for example,

reports with satisfaction that scores on an index of the wilderness

quality of roadless areas were quite normally distributed. There

is no reason why this should be the case except that the Forest

Service has averaged together a number of possibly unrelated

variables (scenic character, isolation, variety). (In

fact, distribution found by the Forest Service deviates

significantly from normality; but, as if often the case, they did

not check the goodness of fit.) The use of normality as a

criterion reinforces sloppiness in scale construction, since a

sloppy scale has more error and is thus more likely to approximate

a normal distribution.

The myth of the bell curve is also consistent with theories

that assume that social behavior is a reflection of individual

differences (provided, also, that it is assumed that individual

differences are normally distributed). Stuart Dodd (1942:251-262),

for example, used the bell curve in developing his theory of social

problems. A social problem, to Dodd, consisted in a deficit of

some characteristic that is socially desirable. The 2% of the

population that falls below two standard deviations from the mean

on a desirable characteristic are the "minimals," and they

constitute the social problems. These "minimals" include

divorcees, prostitutes, illegitimates; the sick, blind, crippled,

or insane; the poor and unemployed; criminals and political

refugees; inferior races such as Bushmen and Pygmies; the

illiterate or ignorant; the overworked and underprivileged; the

offensively vulgar; atheists; foreign language minorities; hermits

and social isolates.

Dodd was certainly aware that not all phenomena are normally

distributed, and he realized that the two percent figure may not

always be appropriate. Yet, only the assumption of normality led

him to even suggest this figure; otherwise, what possible reason

could there be for suggesting that the divorce rate, poverty rate,

unemployment rate, to say nothing of the proportion of foreign

language minorities, should fall at 2%?

Dodd also used the bell curve to estimate the possible range

of human characteristics, determining that it was unlikely for the

range to exceed 12.5 standard deviations (Dodd, 1942:261-262). He

noted, however, that the range of incomes in our "capitalistic

culture" exceeded 2000 standard deviations. His suggestion that the

variance in incomes should be limited to correspond to the variance

in abilities is perhaps a good one, but more rigorous data show

that the assumption of normality cannot be used m determining the

range of these abilities. Weschler (1935) shows on the basis of

much better data, that the range of human traits rarely exceeds a

ratio of 3:1 (the range ratio of Binet Mental Age scores is

2.30:1).

Nothing in this paper should be taken as questioning the use

of the normal distribution where it is appropriate (e.g., in

estimating confidence intervals from random samples). To make

this correct usage clear, it might be wise to revert to the

earlier phrase, "normal curve of error." This would make it clear

that the normal bell curve is "normal" only if we are dealing

with random errors. Social life, however, is not a lottery, and

there is no reason to expect sociological variables to be nor-

mally distributed. Nor is there any reason to expect psycho-

logical variables to be if they are influenced by social factors.

Certain physiological traits, such as length of the extremities,

are often approximately normally distributed within homogeneous

populations. Other traits, such as weight, which are affected by

social behaviors, are not. Indeed, if a phenomenon is found to be

normally distributed, this is very likely an indication that it

is caused by random individual variations rather than by social

forces.

The myth that social variables are normally distributed has

been shown to be invalid by those methodologists who have taken

the trouble to check it out. Its persistence in the folklore and

procedures of social institutions is a reflection of

institutionalized bias, not scientific rigor.

References

Anastasi, A.

1968 Psychological Testing. New York: Macmillan.

Blalock, H.

1960 Social Statistics. New York: McGraw-Hill.

Bohrnstedt, E. and C. Bohrnstedt

1972 'How One Normally Constructs Good Measures,

Sociological Methods and Research, I, 3-12.

Bradley, J.V.

1968 Distribution-free Statistical Tests. Englewood Cliffs,

N.J.: Prentice-Hall.

Cronbach, L.

1970 Essentials of Psychological Testing. New York:

Harper & Row.

Dodd, S.

1942 Dimensions of Society. New York: Macmillan.

Fisher, A.

1922 The Mathematical Theory of Probability. New

York: Macmillan.

Forest Service, U.S.D.A.

Roadless and Undeveloped Areas Within National Forests.

Springfield Va.: National Technical Information Service.

Galton, F.

1889 Natural Inheritance. London: Macmillan.

Goertzel, T. and J. Fashing

1981 "The Myth of the Normal Curve: A Theoretical Critique

and Examination of its Role in Teaching and Research"

Humanity and Society 5: 14-31.

Goodenough, F.

1949 Mental Testing. New York: Rinehart.

Herrnstein, R. and C. Murray

1994 The Bell Curve: Intelligence and Class Structure in

American Life. New York: Free Press.

Hollingshead, A.

1961 Eltmtown's Youth. New York: Wiley.

Hoyt, D.P.

1965 "The Relationship Between College Grades and Adult

Achievement." Iowa City: American College Testing Program,

Research Report No. 7.

Jencks, C. and D. Riesman

1968 The Academic Revolution. New York: Doubleday.

Jencks, C., et al.

1972 Inequality. New York: Basic Books.

Jensen, A.

1969 "How Much Can We Boost I.Q. and Scholastic Achievement?"

Harvard Educational Review 39, 1-123.

Knupfer, G. and R. Merton

1943 "Discussion." Rural Sociology 8, 236-239.

Landau, D. and P.F. Lazarsfeld

1968 "Adolphe Quetelet." In Vol. 13 of International

Encyclopedia of the Social Sciences. New York:

Macmillan and Free Press.

Lundberg, G. and P. Friedman

1943 "A Comparison of Three Measures of SocioEconomic Status."

Rural Sociology 8, 227-236.

Pearson, K.

1912 Social Problems: Their Treatment, Past, Present and

Future. London: Dulau.

1900 "On the Criterion That a Given System of Deviations From

the Probable in the Case of a Correlated System of

Variables Is Such That It Can Be Reasonably Supposed to

Have Arisen from Random Sampling." The London, Edinburgh

and Dublin Philosophical Magazine and Journal of Science

50, 157-175.

Quetelet, L.A.J.

1969 A Treatise on Man. Gainesville, Fla.: Scholar's

Facsimiles and Reprints. Rice, S.

1928 Quantitative Methods in Politics. New York: Knopf.

Thorndike, E.L., el al.

1927 The Measurement of Intelligence. New York: Columbia

University Press.

Thurstone, L.L.

1959 The Vectors of the Mind. Chicago: University of

Chicago Press.

Vernon, P.

1940 The Measurement of Abilities. London: University of

London Press.

Walker, H.

1929 Studies in the History of Statistical Method.

Baltimore: Williams and Wilkins.

Wechsler, D.

1935 The Range of Human Abilities. Baltimore: Williams and

Wilkins.