methodsnotessp2002

Class Notes for Methods and Techniques of Social Research, Spring 2002 - part one of the course

These notes are not intended as a substitute for attending class. They include links to WEB sites discussed in class and an outline of class discussions and in-class activities. If you miss class, check the class notes for information on class discussions and on in-class assignments that should be submitted to WEBCT by those who miss class. These notes will be used on the screen in class, and will be edited during the class. Be sure to reload this file each time you check it, to make sure you have the latest version.

January 23:The goals and organization of the course. Discussion of WEBCT and how to do online assignments. Installation of the Microcase Software from the CD-ROM and disk included at the back of the workbook packaged with your text.

Jan 25: Use of the WEBCT system, including how to submit assignments. We will work on the first assignment, and go over to the computer center for individual help as needed - computer lab 108/109. Use of the Microcase Software for the Introductory Assignment.

January 28: Today we will discuss the nature and uses of science and social science. How does social science differ from other ways of thinking: poetry, philosophy, theology, physical science? We will discuss several documents: Three approaches to knowledge. W.H. Auden's poetry. For a sample of a new concept, click on virtropy. Is this a good concept? Why or why not? Census Document on Racial and Ethnic Categories. Brazilian Racial Categories. Other concepts we can consider are: poverty, power, crime, murder, race, IQ, liberalism/conservatism, homelessness. Or we could look at Personality Types as defined by Carl Jung and Measured by Isabel Meyers-Briggs. There are also techniques such as concept mapping that can be used to develop concepts. Example: data on the Bureau of Justice Statistics WEB site.

January 30: Use of Microcase, overview of the Introductory Exercise.

February 1: Discussion of designing research projects. How do we decide what to study? Supplementary reading in Trochim on the structure of research. You may prefer his "hourglass" metaphor to the circular one on page 14 of our textbook.

Selecting a topic. Typical motives include:

Finding out something we don't know. This may include something local, e.g., what do people in Camden think about the new Governor's actions, something that has been unresolved in earlier research, something that hasn't been studied because it is new, etc. This is what the authors of your book mean when they say "research always starts with wondering."
Another purpose that motivates research is proving to other people that what we "know" is true really is true. This is "advocacy" research, and it can be very one-sided and lead to sloppy work. Often this involves causal arguments, proving "why" something happens. This kind of research may not start with "wondering" but with "arguing."
Answering a question posed to us by our employer or by a client, applied research. Here someone else really chooses the topic.

Formulating a Research Question. This means formulating a "statement" which will involve variables. We have an argument or story in mind at this point.
Defining the Concepts. Usually not a lot of time goes into this stage of empirical research, but some people do write articles focusing on this, e.g., what does "race" or "poverty" mean, what is the difference between "sex" and "gender"
Operationalizing the Concepts. A lot of effort goes into this. Quantitative research means you have to measure your variables and a lot depends on having good measurement. Sometimes this is difficult, e.g., measuring "intelligence" or "liberalism-conservatism" or "mental illness" or "crime rates (various kinds)". Often we use standard measures created by the government agencies that collect statistics.
Formulating Hypotheses. This is usually pretty easy. There is a distinction between "null hypotheses" and regular hypotheses, which is explained on page 13. It means testing the hypothesis that your hypothesis is not true. Thus, you hope to "reject the null hypothesis" rather than "accept the (regular, not-null) hypothesis". So far as I know, there is no word for the opposite of Null, it might be Substantive? Type One Error: accepting that a relationship exists when it doesn't. Type two: rejecting a relationship when it really does exist.
Making observations. This is a major step unless we just get the observations from someone who already did the work.
Analyzing the Data. This is "number crunching" running data through the computer. Of course, one can also analyze qualitative data from interviews or observations, but today even that tends to get quantified (content analysis).
Assessing the results. This is really part of the analysis. If the hypothesis doesn't work out, often researchers go back and change the hypotheses and pretend they knew all along what was going to happen
Publishing the findings. This assumes that you are doing "scientific" or "pure" research, much applied research is actually distributed only within the organization that paid for it. This may be done in person, with a "power point" presentation. Refereed publications: you paper is sent to other specialists for review to decide if it should be published. "Refereed journal." Press release. Publication can be online as well as on paper. You publish the research so you can get credit, see your name in print, get promoted, and also so that you can inform others, and perhaps most important, so that other people can criticize or attempt to
Replicate it. Usually people replicate research in the hope of overthrowing it, if you just find the same thing as before, there is less interest. This cancels out a lot of the bias in social research, since there is usually someone with the opposite bias to correct it.

Here are some samples we can look at: Papers presented at the 2000 ASA meetings in Washington, a Study of Tire-Crash Patterns (Word Format with Excel File Used to Reproduce Graphs.) and some controversial examples including research in criminal justice abortion and crime and on gun control and crime rates. I recently published a paper called "Myths of Murder and Multiple Regression" criticizing a number of studies. Another example is the role of research in the controversy over welfare reform. Margaret Mead's classic work Coming of Age in Samoa which was extremely influential and, many people now believe, wrong. Star Wars: Is Astrology Sociology? Another example is the book The Bell Curve which generated tremendous controversy and claims that it should never have been published. The controversy over a study on the effects of sex abuse. Compstat in the NYC and Philadelphia Police Departments. N.J. Crime Rate Lowest in last 3 decades. N.J. Crime Statistics. The origin and development of the project on South Jersey's Identity that we workied on last semester. Results are on my home page.

Feb 4 - Percentages and Expected Frequencies

Overview of Assignment 2b, primarily the use of crosstabs to compute percentages.

Looking at the frequencies for GOVMED we see that there are 1849 respondents and 896 thought the government should help. To make that a percent we divide the 896 by 1849, getting .4845. Move the decimal two columns to the right and we get 48.5%. Percents always add to 100%, you don't know what they mean if you don't know how they add to 100%. The base of the percent is the 100%, in this case 1849 people are 100% of the respondents. The phrase "of the" tells you the base of the percent.

When I ask you what percent is something, you should say: Percent of what?

Taking the number of people who are liberal and favor Government Help, we could ask three questions:
1. Column percent: What percent of the liberals think that the government should help?
2. Row percent: What percent of those who think the government should help are liberals?
3. Total Percent: What percent of the respondents are liberals who think the government should help?

Row, column and total percentages. A percentage is a ratio between a frequency and a base. In a sentence, the base follows the word "of." For example, if I say, what percent of the voters voted for Bush, the base is the number of voters. This is the denominator in the calculation. The numerator is the number voting for Bush.

For example, assume that:

75 men voted for Bush
95 women voted for Bush
35 men voted for Gore
125 women voted for Gore.

We could put this into a Table:

Men Women Total

Bush 75
56.65 95
113.3 170
.515

Gore 35
53.28 125
106.7 160
.484848

Total 110 220 330

The observed frequencies are in black.
THe expected frequencies are in red.
The chisquare statistic tells if the difference between the observed and expected is greater than we would get by random chance. If p < .05, we say there is a significant relationship between the variables.

The espected frequencies are a form of "null hypothesis." The null hypothesis is that there is no relationship between gender and voting. If this were the case, what percent of the men would we expect to vote for Bush? The answer to that is, the same percent as voted for Bush in the total sample. The same for the women. What percent of the sample voted for Bush 170/330 which is ????%. As a proportion it is .???. Using this proportion, we cancompute the "expected frequencies," these are the frequencies we would "expect" under the null hypothesis that there is no difference between the genders. To compute those, we take the PROPORTION voting for Bush and multiply it by the number of men and then the number of women. .515 * 110 = 56.65 men. This is not a percent, it is a frequency. .515 * 220 = 113.3 women. .485 * 110 = 53.24
.485 * 220 =

Another way to compute expected frequencies is rt * ct/gt where rt = row total, ct = column total and gt = grand total. This is probably easier, but it doesn't help you to understand what the number "means".

On the first test, you will be asked to answer specific questions such as:

How many men were there in the sample? 110
How many women were there in the sample? 220
How many respondents are there? 330
How many respondents voted for Bush? 170
How many respondents voted for Gore? 160
On the null hypothesis that there is no relatinship between gender and vote, how many women would we expect to vote for Bush?

What percentage of the men voted for Bush? (column percent) 75/110 = .682 or 68.2%.

What percentage of the Bush voters were men? (row percent)n 75/170 = 44.1%

What percentage of the voters were men who voted for Bush? (total percent)
75/330 = 22.7%

What percentage of the women voted for Bush? (column percent) 95/220 = 43.2%-

The Chi Square Statistic tells us whether the difference between the "expected" frequencies and the "observed" frequencies is statistically significant. The first few pages of this Chi Square lesson by Amar Patel explain the meaning of "expected" frequencies. It also goes on to explain the computation of chisquare. An Example: Alleged Racial Profiling by the San Diego Police. Here is a one page summary of what we need to know about chisquare. Have some data you want to test with chi square? Use the WEB Chi Square Calculator.

Feb 6 Levels of Measurement.

Variables vs. Constants - variables are characteristics or aspects that take different values among the things being studied (the units of analysis). The task of social science is to explain variation, metaphysics deals with constants.

This follows from conceptualization - that is how we think of the "thing" being measured. For example, race. What is it? Glenn C. Loury defines it as: "a cluster of inheritable bodily markings carried by a largely endogamous group of indiduals, markings that can be observed by others with ease, that can be changed or misrepresented only with great difficulty, and that have come to be invested in a particular society at a given historical moment with social meaning." The Anatomy of Racial Inequality, p. 20.

Suppose we were to measure race, how would we do it? Our goal is to conduct a census of the US population and determine how many people there are of each race. We ask people to check a box indicating their race. What are the boxes? Today the official categories are:

white
African American, black or Negro
Asian
Native Hawaiian or other Pacific Islander
American Indian or Alaskan Native

Furthermore, instead of asking "what is this person's race?" we ask "what is this person's race, mark one or more." Making a choice often becomes a political decision.

In Brazil, the categories are different, as Jennifer Roth Gordon reports: black, white, brown, yellow, also native Brazilian. The people look the same. Conceptions of race in Brazil challenge Americans to go beyond fixed, blood or biologically based categories and see race as fluid, often individually defined, and even seasonal. When race is defined as skin color (which is much of the time in Brazil), being tan from the sun can affect the way you identify. Though the Brazilian census asks individuals to identify as black (negro), white (branco), brown (pardo), or yellow (amarelo), many Brazilians commonly refer to themselves as moreno (or brown). While blackness is both socially and economically stigmatized, extreme whiteness is also considered marked (though not economically disadvantageous). Caught between American Brazilianists who argue for all mixed Brazilians to discover their blackness and other Brazilianists who would rather not see race in Brazil in black and white, Brazilians negotiate fluid racial categories which do not preclude a Brazilian kind of racism.
Hishram Aidi reports that: A DNA study by Brazilian scientists found that 80 percent of the population has at least some African ancestry, and fully half of the nation's 165 million inhabitants consider themselves to be of African descent. ...Myriad racial categories also hamper Afro-Brazilians' ability to mobilize. A 1974 census presented 134 categories, ranging from "bem-branca" (real white) to "bailano" (ebony). In the most recent census only 6 percent of Brazilians classified themselves as black, while 40 percent preferred the term "pardo" ("brown") — and others chose one of the 100 different terms to describe their skin tone: "criolo," "moreno," "mulato", "mestico."

So it is difficult to know, in many cases, exactly what we are talking about. This is true of other sociological terms such as "social class" Are there distinct social classes? In 16th century France, Society is still very much divided into the three traditional estates: those who pray (the church), those who fight (the nobility), and those who work (everyone else). In The Communist Manifesto: The history of all hitherto existing society [2] is the history of class struggles. Freeman and slave, patrician and plebian, lord and serf, guild-master and journeyman, in a word, oppressor and oppressed, stood in constant opposition to one another, carried on an uninterrupted, now hidden, now open fight, a fight that each time ended, either in a revolutionary reconstitution of society at large, or in the common ruin of the contending classes.
But what about in American society today? What classes are there? Upper, Middle, Lower? Upper, Middle, Working, Lower? UU LU UM LM UL LL? How would we sort people into these categories? In his book Yankee City, Lloyd Warner and his associates broke each of the three classes into an
    upper and a lower section. The top, or upper-upper class, is composed of the wealthy old
    families, who have long been socially prominent, and who have had money long enough for
    people to have forgotten when and how the fortune was acquired. For example, how many
    of you know how Joseph Kennedy, the father of President John Kennedy, made his
    fortune? The lower-uppers may have as much money, but have not had it as long, and their
    family has not been socially prominent as long. The upper-middle class includes most of
    the successful business and professional persons, generally of "good" family background
    and comfortable income. The lower-middle class includes clerks, other white-collar
    workers and semiprofessionals, and possibly some foremen and top craftsmen. The
    upper-lower class consists mainly of the steadily employed workers, and is often described
    as the "working class" by those who feel uncomfortable about applying the term "lower"
    to working individuals. The lower-lower class includes the irregularly unemployed, the
    unemployable, migrant laborers, and those living more or less permanently on public
    assistance.
Or do we have continuous stratification, should we talk about "socio-economic status" (SES) instead? In that case we would average together indicators such as Income, Years of Education and Occupational Status. We would have a continuous numerical distribution, not sharp categories.

This gets us into the topic of levels of measurement, no matter what we are measuring, we have to choose a level of measurement: I will discuss six categories (two of which are not in the book):

dichotomy - binary measurement. Two and only two categories. We can use statistics that usually require interval measurement such as correlation, regression, scatterplots. "dummy" variables are when you take a number of categories and reduce them to a series of dichotomies.
nominal - a number of categories. This includes dichotomies. The key thing is that each unit of analysis (people or families) go into one and only one category. If you have more than one, add a category. With this kind of data we can do percentages and cross-tabulations (assignment 2b). We tend to mtreat survey data as nominal or ordinal even if it is inherently interval, e.g., income.
ordinal - the categories are in order from lowest to the highest. poor, middle class, upper class. LL UL LM UM LU UU. Strongly Agree, Agree, UNDECIDED, disagree, Strongly Disagree
rank order - ordinal with no ties permitted. Used in evaluation, e.g., law schools, military..
interval - precise distances between units are measured, e.g., height in inches, income in dollars, test scores, - different statistics: mean, standard deviation, correlation, scatterplot. Exercise 2a. Rates, ratios,
ratio - the same as interval, plus it has a meaningful zero point. Zero means the absence of something.

Friday, February 8.

Units of Analysis - you cannot necessarily generalize findings about one unit, such as a state or other ecological entity, and another, such as the people who live in it. If you find that states with more money have a higher rate of alcohol consumption, you cannot say that it is the wealthier people in the state who drink. This error is called the "ecological fallacy."

Our main topic today is the Quality of Measures. How do we evaluate and measure the quality?

Reliability means Consistency. Between raters, between testings, between forms of a test, between halves of a test, or between the items of a test. With questionnaires, we measure inter-item and item-total correlations. Cronbach's alpha is a widely used statistical measure of inter-item reliability.

Validity is a much more difficult concept. It asks whether the variable measures the concept it is supposed to measure. This is a philosophical question, what does something really "mean". This is problematic with concepts such as "intelligence" that are unclear in themselves. There are several criteria.

Face Validity. Does it seem right? Sometimes variables work but lack face validity, e.g., a measure of "creditworthiness" that includes whether a person moved or not in the last year.
Convergent Validity. Do a number of different measures correlate with each other.
Criterion or Predictive Validity. This is generally considered the most rigorous, but it depends on having a measure of the criterion. If you can measure the criterion anyway, why use a test? Perhaps you can only measure it later in time, e.g., a measure of "likelihood to violate parole". We will know in a year or two how well it worked. The same with admissions tests, employment tests, etc. Or a test for Alzheimer's disease, we can tell for sure if someone has it by doing an autopsy on the brain, but it is too late for treatment purposes.
Construct Validity. The most difficult to understand: "the extent to which a particular measure relates to other measures consistent with theoretically derived hypotheses concerning the concepts (or constructs) that are being measured." In other words, does it work as your theory says it should. This is a bit circular, as in the Binet example given in the text. The fact that results are normally distributed and correlated with age does not mean it is measuring an innate, inborn capacity for learning.

As an example of a construct validity study, consider research done by this class on the construct validity of a measure of UFO Abduction. Here we had a measure that met the usual standards of reliability. But establishing its validity was difficult. The authors thought it had face validity, and it correlated with some other measures, so it could be said to have convergent validity. We could not use criterion validity since we have no "true" measure of whether or not someone was abducted. Instead, we formulated a theoretical hypothesis: that a measure of "experienced anomalous trauma" should correlate with other supposed measures of abduction but NOT with measures of the tendency to have other kinds of unusual mental experiences. As I said in the published paper: In this case, there are at least two alternative theories which can explain why the measure is internally consistent. One is that the respondents are consistently reporting on similar experiences as UFO abductees. The other is that the individuals who score high on the scale share a psychological tendency to have false memories. Flournoy (1911) referred to this phenomenon as cryptomnesia. We concluded that the measure was actually a measure of a tendency to have false memories.Later the National Inquirer got ahold of it and write it up as saying just the opposite.

Cross-Case Compatibility is another measure of quality. Do the measures work the same with different populations. A measure may be valid and reliable with one population but not with another, because of cultural, language, life experience differences.

February 11 - Today we will look at the varieties of measures used in sociology and criminal justice, with a focus on Questionnaire Construction.

Question Content. First of all, what do we want to know? Why do we need the information? Can the respondent supply it? Is the question unbiased? Do we need a number of questions to get at a more general trait?

You should have a hypothesis for each question, a finding that you expect to get with it. This can be simply univariate frequencies, e.g., I expect over 70% will favor the death penalty, or bivariate, e.g., I expect more men than women will favor the death penalty
Distinguish between questions about behavior and questions about attitudes. Attitudes are not necessarily translated into behavior. Behavior is usually more important.

Question Placemement. Next, we can think of the questionnaire as a conversation. We usually begin with an item designed to make the respondent feel at ease, and to give him or her a good idea of what the interview will be about.
Structured and Unstructured Formats. Think of the type of items you want to use. It is confusing to the respondent to switch back and forth between open-ended and closed-ended, or between different sets of answers, e.g. agree-disagree vs. favor-oppose or more-less.
Question Wording. Now we are ready to write the actual items. We want them to be clear, colloquial, unambiguous.
Scales and Indexes. To get at more abstract concepts, we use a number of items to measure something. Usually we just average the scores together, this is called Likert scaling or summated scaling. Some scales attempt to get true interval or ordinal measurement. Guttman scaling is mentioned in the book and uses ordinal measurement. Your book does not even mention the interval techniques, often called Thurstone scaling, which are very rarely used in sociology or criminal justice. Most of our scales are Likert scales or summated scales, which are just averages of a number of items. For this to work, the items have to be highly correlated with each other, which we can test with reliability measures such as Cronbach's alpha. An example, Zick Rubin's scale of romantic love.
Rates and aggregate measures are based on averaging together data for many individuals in the same unit. For example, infant mortality, crime rates, etc. Each is based on an individual event, but we divide the number of events by the total population that qualified. E.g., murder rate per 100,000 people in the population, or infant mortality per 1,000 live births.

In criminal justice, the two most important measuring instruments are the National Crime Victimization Survey and the Uniform Crime Reports. There are also some surveys of criminals. Reliability and Validity of Each Approach. Another example is a survey done by the Lansing, Michigan, police department. as part of a community policing endeavor. There are tens of thousands of surveys done every year. Some examples are: Gallup Poll. How Polls Are Conducted. Harris Interactive. Harris Poll Online.

February 13 -

Use of the "collapse" option in Microcase, pages 51-52 in the Workbook. The purpose here is to eliminate columns cases where we have insufficient data. We can also combine columns which allows us to reconceptualize our data, e.g., combine all minorities into a single "nonwhite" category.

Demonstration of Exercise 3 in the Workbook, pages 69 to 79. They key point here is to look at the details of how the variables are measured. It is particularly important to look at whether a variable is a raw frequency, e.g., the number of Playboy subscribers, or a rate, e.g., the number of Playboy subscribers per 100,000 population. We can use Microcase to convert data to rates if they don't come that way in the data set.

We can assess the reliability of a measure that consists of several items by looking at the item-total correlations, or the inter-item correlations, or the Cronbach's alpha, which is a similar measure. We can assess construct validity by doing cross-tabulations or correlations, depending on the way the variables are measured, and seeing if the relationships work out as our theory says they should. Looking at relationships between variables is more meaningful than looking at the distributions on a single variable when the variable uses terms that are not precisely specified, e.g., if people say they "strongly favor" something, we don't know have a good idea of what that means until we compare it to how many favor other things.

February 15 - Review for First Exam. We will look at some sample multiple choice questions and do some math questions on paper. Bring a calculator.

Feb 18 - first exam - Feb 20 discussion of exam items.

	`Men`	`Women`	`Total`
`Bush`	`75` `56.65`	`95` `113.3`	`170` `.515`
`Gore`	`35` `53.28`	`125` `106.7`	`160` `.484848`
`Total`	`110`	`220`	`330`