March 25 -
Survey research is widely used to establish "proportional facts" about large populations - to go beyond saying "some people are Democrats and some are Republicans" to saying how many are Democrats and how many are Republicans.
Most surveys today are done on the telephone, but some are done door-to-door, some by mail and some on the Internet. I am assigning you to participate in one internet survey so as to have the experience of being part of a survey.
Surveys depend on people answering questions, so they are appropriate for topics that people are willing and able to tell us about. In most cases, people have the right not to participate in a survey, and non-response is a problem. Writing good questions and using good interviewing techniques are the best ways of dealing with this.
The distinction between attitudes and behavior is important. We are usually most interested in predicting behavior, we are interested in attitudes (enduring, learned predispositions to respond in a consistent way to a particular stimulus or set of stimulii) because we believe they predict behavior.
Most survey items are "closed" - they ask people to choose between alternatives. People prefer these questions, in most cases, but in some cases we use open-ended questions that ask people to respond in their own words. These are most useful when we don't know the set of likely answers, i.e., in exploratory work. This work may be better done in focus groups than in individual interviews.
Survey questions should be clearly worded, using colloquial language. There should be a good selection of responses, providing for all the common alternative views. For example, don't just ask if people believe in abortion, give a range of choices - whenever a woman wants one, if her life is in danger, if she really can't care for a child, under no circumstances, etc.
There are many survey archives. We can use these for "secondary" analyses, testing hypotheses not already tested by someone else. Or we can use them to examine trends, or as a source of items for our own surveys. The General Social Survey is a nationwide survey repeated every couple of years, often repeating the same questions. Frequencies are available online in html format, or you can download the data files and analyze them using SPSS. These can be accessed at the University of Michigan survey research center or at the University of California survey center. Berkeley has extensive codebooks, California sometimes does quicker analyses. We can view their items as examples. Later, we will do a project using this data source.
The Gallup Organization has a good online description of "How polls are conducted". It provides a good description of the importance of sampling methods - good sampling enables us to generalize from a small sample to a large population.
March 27
SAMPLING:
Census - enumerate everybody - not practical, too expensive
or hard to do for a large population, if you are in an organization, however,
it may be easy to find people.
Sample - generalize to a population from a selection
or sample - we can generalize if the sample is "representative" -
This can be done through random selection if you have a list to select
from. Simple random sample, everyone has the same chance of appearing
in the sample. Cluster sampling. Done for practical reasons
where we don't have a list or we find travel costs too high. If we
do households, we usually cluster. If we use the telephone, we have
a list of subscribers.
Stratified samples, this means that everyone has
a known, but not equal, probability of being in the sample. This
is done so we can find out about subgroups in the sample. In effect,
each group is sampled. Often we stratify by geographic areas because
they are known in advance.
Nonrandom sampling, done for convenience,
to get variety. But you cannot reliably generalize. SLOPS,
whoever calls in or clicks on the Internet. This may generate anecdotes,
gossip.
We can compute the "margin of error" . This means, we can compute how much our sample statistic is likely to vary from the population paramater. This based on probability theory. The larger your sample, the more certain your results. The size of the population doesn't matter. The size of the sample in practice depends on how much you want to break things down into sub-groups of whatever kind.
Computing the margin of error, Guide to Computing Margins of Error on the WEB site.
For example,
1. In a college class with 85 students, 32 of whom are
black, the mean on the midterm was 75. The standard deviation was 6.21.
What is the margin of error for this mean? This is
a mean score question, so I use the formula
M = 2 * sd / SQRT(N) ;
M = 2 * 6.21/SQRT(85). = 1.35 points, NOT %%.
9. A survey of the tri-county area has 356 respondents,
of whom 82 are black and 55 hispanic. What is the margin of error for statistics
about the opinion of the hispanic residents? This
is a percentage question, but I am not given a statistic, a percentage
result. Use Formula one, M = 1/SQRT(N). What is N???
M - 1/sqrt(55). = .1348.
This formula gives us a proportion, not a percent, or 13.5%. Suppose
I said 61% of the hispanic respondents are voting for McGreevy. That
is that statistic for the sample. The population paramater might
vary by as much as 13.5%. We could say that our "confidence interval"
is between 61 - 13.5 and 61 + 13.5.
or between 47.5% and 74.5%. This means the election among Hispanics
is "too close to call."
Suppose we had 400 Hispanics,
the margin of error would be 5%. For a sample of 1000, M =
1/SQRT(1000) or 1/31. or 3.2%
Suppose we wanted
a 5% margin of error, how large a sample do we need? 400. Suppose
we want a 5% margin of error for each of five electoral districts, how
large a sample do we need? 5 * 400, or 2000.
Representative or random sample Chosen at random from either the total population (simple random sample) or from subgroups of the population.(stratified random sample)
In choosing a sample size, all that matters is the amount of error you can tolerate. The population size is not relevant.
A researcher wants to obtain a margin of error of no more
than 2% in a survey of a county with a
population of 3,000,000. How large a sample is needed?
N = 1/(M*M). M is the margin of error, expressed as a proportion.
M = .02 because it says 2%. N = 1/(.02*.02) N=
1/.022 N =2500. Simple
random sample.
Suppose we were going to do this for five counties, and we wanted a 2% margin of error for each? How large a sample would we need? A 2% margin of error requires 2500, but we need it for eachcounty so we need 5 * 2500 or 12500. Stratified random sample, consisting of a simple random sample of each of the subgroups.
3. 59% of the respondents in a survey
of a state with seven million Republican voters voted for Bush,
41% for Gore. There were 625 respondents.
What is the margin of error for the percent voting for Bush?
M = 2 * SQRT((p
* (1-p))/N). What is p, the proportion
of
respondents giving a certain response. The sample statistic.
In this case, what is p? .59 What is N? M =
2 * SQRT((.59 * (1-.59))/625). = .03935 as a proportion, or 3.94%
expressed as a percentage.
What does that mean? We
can be "95% sure" that the population paramater (the true value for the
population) is witin 3.94% of the sample statistic. One way to express
this is as a "confidence interval".
The lower bound of the confidence
interval is the sample statistic minus the margin of error, in this 59-3.94
= 55.06%.
The upper bound of the confidence
interval is the sample statistic plus the margin of error, in this case
59+3.94 = 62.94%.
We are confident that the true
figure, the "population paramater" is between 55.06% and 62.94%.
If 47% vote for Bush, 49% for
Gore and 4% for Nader. A sample of 1200. What is the margin
of error for the Nader vote?
p = .04, What is 1-p?
.96 M = 2 * SQRT((.04 *
(1-.04))/1200) = .0113 or 1.13%
What is the margin of error
for the Gore vote? M = 2 * SQRT((.49 * (1-.49))/1200) = .0289
or 2.89%.
April 1:
Exercise 4: Selecting Cases, on page 91 of the
workbook
First, we can answer the "Before You Begin" Questions
1. The difference between a census and a sample? Census enumerates the whole population. A sample is a portion selected to be representative.
2. Parameter and a statistic? The statistic comes from a sample, the "parameter" is the "true" population value.
3. Confidence level? How
certain we can be of the margin of error, it is almost always set at 95%.
Confidence
interval? The range within which the
population parameter is 95% certain to fall.
Margin
of error? The amount by which we are
95% confident that the sample statistic may differ from the population
parameter.
New York Times Summary: "In theory, in 19 cases out of 20 [confidence level] the results based on such samples [sample statistic] will differ by no more than three percentage points [confidence interval] in either direction from what would have been obtained by seeking out all American adults[population parameter]"
We get the upper bound of the confidence
interval by adding the margin of error to the sample statistic.
We get the lower bound of the confidence
interval by subtracting the margin of error from the sample statistic.
If the confidence interval is plus or minus 3%,it means that we are 95% sure that the true population parameter is within 3% of the sample statistic.
I would say, the margin of error is 3%. If the sample statistic was, for example, 55% supporting McGreevy, the confidence interval would be 52% to 58%.
4. Simple sample, throw darts
at the directory, write them on slips of paper and put them in a hat, use
a table of random numbers. Systematic sample, choose every 50th or
100th case.
Nonresponse bias: - people don't answer.
Selective availability - some people are home more.
Areal bias, - household surveys, neighborhoods are more
convenient.
Now we can look at some of the examples in the book.
Review for third midterm
Chapter 7 on Survey Research.
Simple fact vs. a proportional fact. Qualitative vs. quantitative.
We want to apply numbers to a population. Sampling allows us to make statements about a population from a much smaller sample.
Behavior or attitudes. Usually we study reported behavior, not observed behavior.
Questionnaire with a list of standard questions, everybody is asked most or all of them. Most are closed ended, open ended are used when we don't know the likely range of answers.
Questions should be clear, unambiguous, colloqial.
Cross-sectional and trend study. In a trend study, you survey the same population at different points in time. In a panel or a longitudinal survey, you interview the same people at several points in time. Longitudinal studies tend to be over periods of at least a year.
Aging effects - people change as they get older.
Cohort effects, "generational" Comparing
people born during a certain period of history. The "baby boom" generation.
Chapter 4: sampling
population and a sample. A sample is selected to represent a population. We want sample members to be typical, this is often guaranteed by random selection.
Often we use stratified sampling, in order to guarantee enough members of smaller groups in the population. Logically or mathematically, we are sampling sub-populations. We need as many individuals from smaller groups as from larger ones if you want accurate statistics about the smaller groups.
Cluster sampling is done for convenience because we don't have a list or can't access a total population conveniently. Generally this is done with geographic areas, "census tracts." This is done to save time and money. Stratification is done to get adequate sample sizes for sub-groups.
We also have non-random samples when necessary. "Snowball" sample, get each person in the sample to recommend others. SLOPS, self-selected - these are for entertainment. You can't really generalize to a population.
A stratified sample can be weighted to give accurate figures about the population.
"parameter" - the true value for the population - versus
the "statistic" which is what we got from our sample. The margin
of error tells us how much our statistic is like to be "off" or to vary
from the population paramater.