"Science" - the goal of science is to establish empirical
theories that
describe and explain reality. Physical sciences very
successful with
this goal, social sciences much less. Aristotle, three kinds
of
knowledge, episteme, techne and phronesis. In thios course,
I
emphasize the techniques - techne. Next semester, we deal
with the
ethics in CJ. The textbook talks about science, but the workbook
is
techniques.
Artistic fields seek beauty: poetry, painting, music.
Social science
does not tend to have much literary or artistic merit...
Building block is concepts. Used to describe the world, then
to explain
it. We have to define our concepts explicitly, and we have
to measure
them "operational definitions." We try to be explicity about
what we
are saying. Concepts like "mother" or "race" are not unambiguous,
we
have to state what meaning we are giving to them. In emprical
research
the bottom line is measurement, how do we know? Concepts
are good
if they are useful for our purposes.
Concepts should have clear boundaries, be parsimonious. Virtropy
was
an example of a poor concept, the things that made it up really
did not
fit togfether in any logical or emprical way.
September 10:
Continue with Chapter One -
Theory and a hypothesis. Theories are more abstract, they
are general
statements. Hypotheses are testable. Empirical statements.
Falsifiable. Something that we can test with our senses,
with
observation. Tautology, a statement that is logically true
based on the
wording.
Our statements tend to be probabilistic, not absolute. "Smoking
causes
cancer." We have a degree of confidence in what we say, and
we try to
measure that level of confidence.
Where do we get our theories? From observations or from other
people
or writings by other people.
Induction - from specific observations to generalizations.
Deduction - go from the generalization to the hypotheses we wish
to
observe.
What do we study?
"pure vs applied" research.
"advocacy research" - trying to prove a point.
"evaluation research" done to establish whether a program
works or
not. Much applied research is in this category. Applied
research may
also deal with "needs assessment."
Sept 14. "Broken Windows" shows that there is a continuingdebate
about the causes of crime rates, there is not agreement.
We need to
look at data to answer these questions, particularly where we can
compare cases with different characteristics, i.e., cities with
more or
fewer officers. This is a"natural Experiment"
Look at the exploratory exercise in the workbook. Open
the USA data
file: is this an "aggregate" or "survey" data file?
It is aggregate. If the
units have a lot of variation, the "average" figure for a unit,
such as a
state, may be misleading.
gss data set is a survey data set, with 2832 people. This
is a sample, the
size was a decision made by the researchers. To get the "margin
of erro"
we can use the formula M = 1/sqrt(N). This gives a
margin of error of
1.9% for this sample.
September 17:
Examples from the section on The Research with Aggregate Data, pages
47 to 53 in the workbook. The use of a scattergram, a graph
which
plots two continuous variables. An example of a plot of a
plotting a
scattergram of height and weight on the blackboard.
The regression
line is the straight line that best fits the points. The
correlation
coefficient is a measure how how closely the points fit a straight
line, or
of how well one variable can be predicted by another. If
the correlation
is perfect, the two variables predict each other perfectly.
A perfect
correlation may be +1 or -1, depending on whether the relationship
is
"positive" or "negative" (i.e., inverse). If you square the
r it tells you the
percentage of the variance explained using one variable to explain
or
predict the other. An excel file of the relationship between
height and
weight is available at:
http://crab.rutgers.edu/~goertzel/heightweight.xls. The file
also
includes the regression formulas. I put the chart and the data
in two
separate "sheets". This is material we will be covering after
the first
midterm, so this is just intended to give you an overview.
September 19:
Papers returned. They were graded rather rigorously using
the
following formulas. The ones submitted online have been regraded
to
the same scale.
Grading scale:
printout 5 pts.
1a 9 pts.(3 for description,
2 for each category)
1b 7 pts.(3 for description,
2 for each category)
2a 8 pts.(2 for each
variable)
2b 8 pts.(2 for each
variable)
2c 4 pts.(2 for each
variable)
3a 8 pts.(2 for each
line, 2 for how many cases were missing)
3b 10pts.(2 for each
line, 2 for how many cases were missing)
4a 3 pts.
4b 2 pts.
4c 6 pts.(2 for each
line)
4d 6 pts.(2 for each
line)
5a 6 pts.(1 for each
state)
5b 6 pts.(1 for each
state)
6a 2 pts.
6b 6 pts.(2 for each
line)
7a 2 pts.
7b 2 pts.
this should add up to 100 pts.
Chapter Six on Research Designs.
The logic with which a research project is organized. Our
goal is
usually to say something about the relationship between variables.
Variables are characteristics that take different values, e.g,
height,
weight, number of hunters, party affiliation, etc. The variables
are what
we study.
In most cases there is one variable we want to understand, we call
that
the dependent variable. This is a decision, it is not inherent
in the
variable itself. For example, we could take weight as our
DV. Then we
look for the Independent Variables that "cause" it, or at least
are
correlated with it.
Most rigorous way to explain causal relationships is the Experiment.
In
an experiment you manipulate one Independent Variable. You
control
for all other independent variables or "control variables."
You observe
the Dependent Variable. You take your "subjects" and sort
them into
two identical groups, preferably through random assignment.
There are
ethical limits, either it has to voluntary with "informed consent"
or you
have to prove why it isn't. If you don't know something works
for an
illness, it is ethical to experiment with it.
"External validity" - do the
findings apply to other circumstances. "Internal validity"
- was the
experiment carried out correctly.
There are practical limits on experimentation. Experiments
are
deductive, you have a theory and you test it.
Survey - ask a sample of people questions. Best used
for descriptive
purposes rather than for testing causal hypotheses. Problem
in a survey
is you don't have a before and an after, unless you do a longitudinal
survey. You don't manipulate the variables, so you don't
know which is
cause and which effect. Statistical analysis is used to "control"
for
extraneous variables. Not as logically rigorous, but it deals
with real
world situations and not artificial ones. Usually we take
people's word
for things.
Structured questionnaires
Unstructrured Interviews
including Group Interviews - "focus
group"
Field research - go into a natural setting and observe what happens.
Anthropologists more often. More inductive, you observe whatever
happens and try to figure out why.
Aggregate or comparative research. Take data about geographic
or other
groups, often data gathered by government agencies. Very
widely used
in CJ because the system generates a lot of data. Cross-sectional
comparisons, at one point in time. Comaring states.
Trend analysis or
time series analysis, looking at how things change over time.
Sept 21
Item 2a1 in the homework. Comparing two maps, we can see that
they
don't have much relationship to each other, so we checked "neither.
Similar - r would be close to +1
Opposite - r would be close to -1
Neither - r is close to zero.
In this example, r = .079, which is closer to zero than to + or - 1.
Two different measures:
The "strength" of the relationship, "r" is a measure of that.
the further
from zero, the better, the stronger.
The "significance" of the relationship, "p" is a measure of that.
The
closer to zero the better, because this is a measure of the likelihood
of
random error. This is also indicated with asterisks.
Two asterisks **
means p < .01, * means p < .05 "p" means Probability
or "Prob."
"Operationalizing a concept" - that means finding a
way to measure it,
the answer to the question "how do you know" -
Dependent variable, what we are trying to explain is the rape rate.
OUr
IV is hunting.
Relationship between the graphs and the summary statistics.
Summary
statistics allow us to summarize a lot data easily, but we lose
a lot of
detail that we can get in a graph. With the r, if the coefficient
is
negative, the regression line of the scatter plot will go down.
Sept 23: ABC News Video on Junk Science, related readings
on the WEB site.
September 25:
Overview of Assignment 2b, primarily the use of crosstabs to compute percentages.
Looking at the frequencies for GOVMED we see that there are 1849 respondents and 896 thought the government should help. To make that a percent we divide the 896 by 1849, getting .4845. Move the decimal two columns to the right and we get 48.45%. Percents always add to 100%, you don't know what they mean if you don't know how they add to 100%. The base of the percent is the 100%, in this case 1849 people are 100% of the respondents. The phrase "of the" tells you the base of the percent.
Row, column and total percentages. A percentage is a ratio between a frequency and a base. In a sentence, the base follows the word "of." For example, if I say, what percent of the voters voted for Bush, the base is the number of voters. This is the denominator in the calculation. The numerator is the number voting for Bush.
For example, assume that:
75 men voted for Bush
95 women voted for Bush
35 men voted for Gore
125 women voted for Gore.
We could put this into a Table:
| Men | Women | Total | |
| Bush | 75
56.65 |
95
113.3 |
170 |
| Gore | 35
53.24 |
125
106.7 |
160 |
| Total | 110 | 220 | 330 |
Null hypothesis. Suppose there is no
relationship between gender and voting, what percent of the men would we
expect to vote for Bush? The answer to that is, the same percent
as in the total sample. The same for the women. What percent
of the sample voted for Bush 170/330 which is 51.5%.
As a proportion it is .515. Using this proportion, we cancompute
the "expected frequencies," these are the frequencies we would "expect"
under the null hypothesis that there is no difference between the genders.
To compute those, we take the PROPORTION voting for Bush and multiply it
by the number of men and then the number of women. .515 * 110 =
56.65 men. This is not a percent, it is a
frequency. .515 * 220 = 113.3 women.
.485
* 110 = 53.24
.485 * 220 =
We can answer specific question,
How many men were there in the sample? 110
How many women were there in the sample? 220
How many respondents are there? 330
How many respondents voted for Bush? 170
How many respondents voted for Goire? 160
What percentage of the men voted for Bush? (column percent) 75/110 = .682 or 68.2%.
What percentage of the Bush voters were men? (row percent)n 75/170 = 44.1%
What percentage of the voters were
men who voted for Bush? (total percent)
75/330 = 22.7%
What percentage of the women voted for Bush? (column percent) 95/220 = 43.2%-
September 28 - -
If anyone wants to install the professional version of Microcase, I have some copies to lend out.
Levels of Measurement: nominal, ordinal, interval and ratio. This is covered well in our text and in Levels of Measurement and Units of Analysis . The concept of expected frequencies is explained well in the Chi Square lesson by Amar Patel We will demonstrate the use of the WEB Chi Square Calculator. Key points about chi square are here. and show how these techniques could be used in the case of Alleged Racial Profiling by the San Diego Police.
October 1: Computation of descriptive statistics as described in Tronchim. Completion of in-class exercise.
Descriptive statistics are about your sample. Their goal is to summarize the essential characteristics of the sample.
Inferential statistics are about generalizing from your sample to a larger population. They have the form p = or p = <
Two ways of presenting quantitative data:
graphics
summary or descriptive statistics.
The most basic is the average or mean. Measure of "central
tendency."
Most common is the mean, which is
computed by adding them all up and dividing by the N. Can be distorted
by extreme classes. Mean requires interval data. Good if the
data are approximately "normally" distributed.
Median, which is the case in the middle. Requires only ordinal data.
Mode, most frequent case, nominal data.
A second concept is dispersion. How much variation is there, how spread out things are. The range is an ordinal measure, how far from the lowest to the highest. Interquartile range, the distance from the 25% points to the 75% points. The standard deviation is an interval measure of deviation. The variance is the S.D. squared.
The distribution, putting the cases in order, either in a table or a graph. We want a linear scale, or even categories.
Frequency distributions are used.
For the example done in class, the histogramand frequency distributions
were as follows:
80,000 # 9.1%
60,000 ## 18.2%
40,000 ###### 54.5%
20,000 0.0%
10,000
0.0%
9,000 ##
18.2%
THe mean and standard deviation were computed in class using Excel as follows:
mean X-mean (x-mean)squared
80000 41636.36364 38363.63636 1471768595
60000 41636.36364 18363.63636 337223140.5
60000 41636.36364 18363.63636 337223140.5
40000 41636.36364 -1636.363636 2677685.95
40000 41636.36364 -1636.363636 2677685.95
40000 41636.36364 -1636.363636 2677685.95
40000 41636.36364 -1636.363636 2677685.95
40000 41636.36364 -1636.363636 2677685.95
40000 41636.36364 -1636.363636 2677685.95
9000 41636.36364 -32636.36364 1065132231
9000 41636.36364 -32636.36364 1065132231
4292545455 Sum of the squares
390231405 variance
19754.27561 standard deviation
458000 41636.36364 416363.6364 sums
41636.36364 Mean
Oct 3:
Reliability and Validity - the quality of measurement.
Reliability - consistency.
Two tests, see if they correlate.
Two raters. Split/half.
"internal consistency" Part
against the whole, we check a number of items against each of the others
and against all of the others. Coefficient alpha - a measure of consistency
for questioninaire items.
We can figure out the reliability and we want reliable measures, but that's good enough. Validity is whether it is measuring the right thing, what we meant to measure. This is difficult conceptually. "Intelligence" what does it mean? IQ test, we have items that may be reliable, consistent, but do they measure what we really really mean?
Face Validity - does it look like it is measuring the right thing?
Predictive or criterion validity. Pragmatic. You have to have a criterion, something to measure it against.
Other ways of testing validity are used when we don't have a good criterion.
Convergent validity - Do a number of other measures give you the same result.
Construct validity. Does the measure work as our theory says it should. UFO study established that the measure worked better if treated as a measure of "false memory syndrome" than if used as a measure of "experienced anomalous trauma."
Oct 8.
We spent the hour on expected frequencies and standard deviations.
There was not time to deal with regression, so we put that off until the
second exam. Here are the exercises that were worked in class:
8. Now, try figuring out some expected frequencies. What would
you expect to be the cell frequencies if there was no difference between
Men and Women on the issue, given the marginal frequencies provided?
| Men | Women | Total | |
| Agree | 9.821 | 15.18 | 25 |
| Disagree | 45.179 | 69.821 | 115 |
| . | 55 | 85 | 140 |
Quickie formula: the expected frequency
is rt * ct/ gt
1. what proportion of the sample agreed?25/140
=.17857
2. what proportion of the sample disagreed?115/140=.82143
3. what proportion of the sample answered?
1.00
4. How many men would we expect to
agree? How many men are there? 55 What is the likelihood
that any man would agree, if men and women don't differ, .17857. = 9.821
- this is NOT a percent. It is an expected frequency.
Use the quick formula 55 * 25/140.
5. How manymen would we expect
to disagree? 115 * 55 = /140 or the proportion
in the sample disagreeing, .82143 * 55.
6.How many women would we expect to agree:
25*85 /140 =
7. How many women would we expect to
disagree? 115*85/140
9. The following students achieved the following scores on
the midterm: Joe, 85; Sam, 62; Jane, 87;
Samantha, 71; Wendy, 78.
What is the mean score for this group?
Sum(x)/N = 383/5 = 76.6
10. What is the standard deviation of the scores for this group?
x - mean (x-mean) (x -
mean)2
85 - 76.6 = 8.4
70.56
62 - 76.6 = -14.6
213.16
87 - 76.6 = 10.4
108.16
71 - 76.6 = -5.6
31.36
78 - 76.6 = 1.4
1.96
Sum of squares (Divide by N-1; N is 5 no N-1 is 4) 425.2/4 = 106.3, which is the variance. The standard deviation is the square root of the variance, or 10.3.
This measures the dispersion. If this were a large, normally
distributed sample, 2/3 of the people would be within one SD of the mean,
i.e., between 65.7 and 86.9. 95% would be within two standard deviations
of the mean, i.e., between (76 - 20.6) and 76 + 20.6) between 55.4
and 97.2.
11. Plot a frequency distribution for this group:
Create a linear scale and plot the cases. The range of the
scale should fit the distribution.
100
90
XX
80 X
70 X
60 X
50
Oct 15
Regression Analysis, calculating the formula for a straight line that most closely fits the points in a scattergram.
The formula for a line:
dependent variable = Intercept * coefficient * independent variable
or
Y = a + b X
a and b are "paramaters" which fix the nature of the line. x and y are variables. Each pair of x and y defines a point on the line.
How do we equations on a line? Cartesian plane. Can also be three or multi dimensional, more than three dimensions are difficult to graph.
Examples.
Take the equation y = x (which could also be written y = 0 + 1 * x)
If X eq 1 Y = 1
If X eq -1 Y = -1
If X eq 0 Y = 0
Graphing this we see that it is a line passing through the 0,0 point at a 45 degree angle, going up from left to right.
If y = 1 + X, the line will be pushed up one point.
If Y = - X, the line will go down from the upper left to the lower right.
If Y + 1- * -2 X, the line will be lower and go down more sharply. You can see these by plotting them on a graph (which I will not attempt to type into the notes, this will be done on the blackboard.) There is a WEB Site which plots these sample lines.
SAMPLING:
Census - enumerate everybody - not practical, too expensive or hard
to do for a large population, if you are in an organization, however, it
may be easy to find people.
Sample - generalize to a population from a selection or sample -
we can generalize if the sample is "representative" - This can be
done through random selection if you have a list to select from.
Simple random sample, everyone has the same chance of appearing in the
sample. Cluster sampling. Done for practical reasons where
we don't have a list or we find travel costs too high. If we do households,
we usually cluster. If we use the telephone, we have a list of subscribers.
Stratified samples, this means that everyone has a known, but
not equal, probability of being in the sample. This is done so we
can find out about subgroups in the sample. In effect, each group
is sampled. Often we stratify by geographic areas because they are
known in advance.
Nonrandom sampling, done for convenience, to get variety.
But you cannot reliably generalize. SLOPS, whoever calls in or clicks
on the Internet. This may generate anecdotes, gossip.
We can compute the "margin of error" . This means, we can compute how much our sample statistic is likely to vary from the population paramater. This based on probability theory. The larger your sample, the more certain your results. The size of the population doesn't matter. The size of the sample in practice depends on how much you want to break things down into sub-groups of whatever kind.
Computing the margin of error, Guide to Computing Margins of Error on the WEB site.
For example,
1. In a college class with 85 students, 32 of whom are black, the mean
on the midterm was 75. The standard deviation was 6.21. What is the margin
of error for this mean? This is a mean score question,
so I use the formula
M = 2 * sd / SQRT(N) ; M =
2 * 6.21/SQRT(85). = 1.35 points, NOT %%.
9. A survey of the tri-county area has 356 respondents, of whom 82 are
black and 55 hispanic. What is the margin of error for statistics about
the opinion of the hispanic residents? This
is a percentage question, but I am not given a statistic, a percentage
result. Use Formula one, M = 1/SQRT(N). What is N???
M - 1/sqrt(55). = .1348. This formula
gives us a proportion, not a percent, or 13.5%. Suppose I said 61%
of the hispanic respondents are voting for McGreevy. That is that
statistic for the sample. The population paramater might vary by
as much as 13.5%. We could say that our "confidence interval" is
between 61 - 13.5 and 61 + 13.5. or
between 47.5% and 74.5%. This means the election among Hispanics
is "too close to call."
Suppose we had 400 Hispanics, the margin
of error would be 5%. For a sample of 1000, M = 1/SQRT(1000)
or 1/31. or 3.2%
Suppose we wanted a 5% margin of
error, how large a sample do we need? 400. Suppose we want
a 5% margin of error for each of five electoral districts, how large a
sample do we need? 5 * 400, or 2000.
Representative or random sample Chosen at random from either the total population (simple random sample) or from subgroups of the population.(stratified random sample)
In choosing a sample size, all that matters is the amount of error you can tolerate. The population size is not relevant.
A researcher wants to obtain a margin of error of no more than 2% in
a survey of a county with a
population of 3,000,000. How large a sample is needed?
N = 1/(M*M). M is the margin of error, expressed as a proportion.
M = .02 because it says 2%. N = 1/(.02*.02) N=
1/.022 N =2500. Simple
random sample.
Suppose we were going to do this for five counties, and we wanted a 2% margin of error for each? How large a sample would we need? A 2% margin of error requires 2500, but we need it for eachcounty so we need 5 * 2500 or 12500. Stratified random sample, consisting of a simple random sample of each of the subgroups.
3. 59% of the respondents in a survey
of a state with seven million Republican voters voted for Bush,
41% for Gore. There were 625 respondents.
What is the margin of error for the percent voting for Bush?
M = 2 * SQRT((p * (1-p))/N).
What is p, the proportion of
respondents giving a certain response. The sample statistic.
In this case, what is p? .59 What is N? M =
2 * SQRT((.59 * (1-.59))/625). = .03935 as a proportion, or 3.94%
expressed as a percentage.
What does that mean? We can be "95% sure"
that the population paramater (the true value for the population) is witin
3.94% of the sample statistic. One way to express this is as a "confidence
interval".
The lower bound of the confidence interval
is the sample statistic minus the margin of error, in this 59-3.94 = 55.06%.
The upper bound of the confidence interval
is the sample statistic plus the margin of error, in this case 59+3.94
= 62.94%.
We are confident that the true figure, the
"population paramater" is between 55.06% and 62.94%.
If 47% vote for Bush, 49% for Gore and 4% for
Nader. A sample of 1200. What is the margin of error for the
Nader vote?
p = .04, What is 1-p? .96
M = 2 * SQRT((.04 * (1-.04))/1200) = .0113 or 1.13%
What is the margin of error for the Gore vote?
M = 2 * SQRT((.49 * (1-.49))/1200) = .0289 or 2.89%.
October 24
Questions on page 91.
1. The difference between a census and a sample? Census enumerates the whole population. A sample is a portion selected to be representative.
2. Parameter and a statistic? The statistic comes from a sample, the "parameter" is the "true" population value.
3. Confidence level? How
certain we can be of the margin of error, it is almost always set at 95%.
Confidence
interval? The range within which the
population parameter is 95% certain to fall.
Margin
of error? The amount by which we are
95% confident that the sample statistic may differ from the population
parameter.
We get the upper bound of the confidence
interval by adding the margin of error to the sample statistic.
We get the lower bound of the confidence
interval by subtracting the margin of error from the sample statistic.
If the confidence interval is plus or minus 3%,it means that we are 95% sure that the true population parameter is within 3% of the sample statistic.
I would say, the margin of error is 3%. If the sample statistic was, for example, 55% supporting McGreevy, the confidence interval would be 52% to 58%.
4. Simple sample, throw darts
at the directory, write them on slips of paper and put them in a hat, use
a table of random numbers. Systematic sample, choose every 50th or
100th case.
Nonresponse bias: - people don't answer.
Selective availability - some people are home more.
Areal bias, - household surveys, neighborhoods are more
convenient.
------------
Survey Research, Oct 26
Asking questions. Open or closed ended. You want to get their opinion, not a socially appropriate response, so you try to be neutral. People tend to view it as a test, to seek approval of the interviewer. People get satisfaction from the chance to express themselves in a non-judgmental atmosphere.
Refer to Tronchim for examples of different kinds of questions: dichotomous, Likert, etc.
Oct 29
There were a number of problems with the Margins of Error homework assignment, so we will go over some of the items.
5. In a survey of 1000 voters, 600 were Democrats, 300 Republicans and 100 Libertarian. 65% of the Republicans favored George Bush in the primary. What is the margin of error for this percentage?
Which formula do we use?THe fact
that it saysw 65% of the Republicans
ttells us it has to be formula two.
M = 2 * SQRT((p * (1-p))/N) -= p is the proportion giving a certain
answer, in this case .65
M = 2 * SQRT((.65 * (.35))/300)
N is 300 because the question says "of the Republicans".
.055 or 5.51%. - the answer should be in %.
6. A survey is to be conducted of attitudes among white, black and hispanic respondents in Camden County. The population is 300,000. Of thispopulation, 80% is white, 15% is black and 4% is Hispanic. The researcher wants to achieve a 3% margin of error for the estimates for each of the groups. How large a sample is needed?
Which
formula? formula 3
N = 1/(M*M). The only unknown is M, the margin of error that is required.
We convert this to a proportion, a 3% margin of error becomes .03.
= 1111.11
Since we have three groups, and
we need a 3% margin of error for EACH, we need 3 times 1112 = 3336.
What kind of
a sample is this? A stratified sample, which means in effect three
sub-samples.
We examined a number of graphs, which are linked from the home page. Polar area diagrams invented by Florence Nightingale. Anscombe's quartet demonstrated how the same regression equation may fit a number of different distributions.
Sample question based on a graph
from the BJS:
Which decade had a marked increase
in the homicide rate:
a. 1955-64
b. 1965-74 c. 1975-84 etc.
We examined a
Nov 2:
Taking the examle of a survey in
whihc 54.3% said they voted for Clinton. We know that in the population
Clinton got just under 50%, let's say 49.75%. Could this difference
be due to sampling error? We have to know the sample size, how many
were asked the question, n = 870
We use the formula for cases where
we are given a percentage vote, M = 2 * SQRT((p * (1-p))/N).
p = .543 1 - p = .457 M = 3.38%. A confidence interval lower bound would be 54.3% - 3.38 = 50.92%
Nov 7. Using wages as the IV and Suicide as the DV, we found that the correlation was positive, .705, and p=.000, it was significant. The regression equation Suicide Rate = 10.563 + .182 * Wages
It is not a linear relationship because we can see significant breaks in the patter., We can see that there are two clusters of cases. This is not uncommon with time series, because a lot of things change together over time.
We developed an Excel file as an example, using data on
trends in Gonorrhea rates from 70 to 75, which was a period of rapid growth.
A linear extrapolation showed that they would continue to a much higher
rate. However, in the real world, 1975 was a turning point and the
rate went down.
-- Nov 9
We did an example of the Trends assignment, giving some possible explanations.
The explanations may vary, but you should have the description of the trends
correct. The results we got were as followws:
Select the variable "10) HomicideRate" and examine
the Time Series Graph.
Q1: Which years had the highest
homicide rates? What happened in those years that might
explain the trend?
There was a peak in 1934, then another long peak in the 1970s and 1980s. The first seems to have been correlated with the prohibition era, the second with the period of social unrest in the 1970s and 80s related to Vietnam, racial conflict, andso on. Homicide seemed to increase during periods of economic affluence.
Select the variable "9) Suicide Rate" and examine the
graph
Q2: What years had the highest suicide
rates? What happened in those years that might explain the trend.
Suicide was highest in 1908-1912, then peaked again in 1930.
It seemed to rise during periods of affluence. Looks like the gold
standard might be involved, when we went off it, it came down. Tight
money policies.
Return to the menu and select both the Homicide Rate and the Suicide Rate. Print this table out and staple it to this report if you answer the questions by hand. If you are typing the questions for submission to WEBCT, copy the table and paste it into your report.
Q3: Do these trends appear to be related? Is the relationship the same or different in different decades? What happened in the 1940s? In the 1980s?
Yes, they appear to be correlated, both reached peaks in 1932, In the 40s they both went down.
Return to the Menu and compute the Scatterplot with the Suicide Rate as the Dependent Variable and the Divorce Rate as the Independent Variable. . Select the Regression Line and the Residuals. Print out the Scatterplot and attach it to this assignment.
Q4: What is the correlation coefficient? -.126 Is it statistically significant? np Is it positive or negative?neg
Q5: Fill in the regression equation: Suicide Rate = 12.616 - .038 * Divorce Rate
Q6: Based on your examination of the Scatterplot, would you say that the relationship between the Suicide Rate and Divorce Rate is linear? That is, would you say that a straight line - the regression line - is a good approximation of the pattern? Would you say that the divorce rate is a causal factor that helps to explain the suicide rate?
No, this is not a linear pattern, something else must be going on.
---
We spent the rest of the class doing an example of the Excel assignments.
November 28 - Causal Analysis
Probabilistic cause, not an absolute cause, not a cause that is sufficient or necessary. "Cigarette smoking causes cancer." WHat we mean is, smoking cigarettes increases the likelihood of getting cancer. How much?
There are multiple causes for everything. What we want to find out is how much each thing contributes. There are also causal linkages, or indirect causes. A causes B and then B causes C.
Diagraming causal models. We put the dependent variable
at the right. We draw arrows going into it for each causal variable
that effects it directly. Then we can have arrows that go into the
arrows, steps into the causal analysis, as in this sample file:
http://crab.rutgers.edu/~goertzel/homomale.htm
Criteria of Causation - how do we know that something is a cause of something else.
1. Time Order. The cause comes before the effect. Sometimes we sort out the time order theoretically, we assume that education preceeds employment. Or we can use a research design that involves gathering data at two points in time. If you don't have measurements at two points in time, this is shaky.
2. Correlation. The two variables vary together. When one is high, the other is high OR when one is low the other is high. This gets at the degree of causation, the higher the correlation the strong the causal relationship.
3. non-spuriousness, we want to know that the correlation is not cause by something else. We can test this with an experimental design, if feasible. Or we can use statistical controls, which are not quite as convincing but its all you do in many cases.
November 30
Causal aspects of variables/. This has to do with our causal model, it is not inherent in the variables.
Dependent Variable - that is what we want to explain. Often these are opinions or behaviors
Independent Variable - what we use to explain it. Often there are traits or physical characteristics, e.g., sex or race, almost always independent.
If you studies the relationship of race on voting, for example, race would be independent and voting dependent.
Antecedent variables, things come before the independent variable. This helps us to deal with a causal chain. Antecedent variable cause IV which causes the DV.
Intervening Variables, this that are intervening, e.g. Race determines ideology which determines the vote.
We demonstrated assignment 5a, and typed the answers in Microsoft Word. If you want to do this assignment on WEBCT it is better to use Word instead of Netscape Composer because it has a drawing tool. You cansubmit the assignment in Word.
December 3 - we went over the results of our survey, the frequencies
and two stories are
available online. Copies of a paper called "Myths of Murder and
Multiple Regression" were distributed, copies are in the "Papers" folder
on our WEBCT site. An abstract is as follows:
Multiple regression has consistently failed to provide definitive
answers to policy controversies in criminal justice, yet researchers continue
to attempt to use regression techniques for this purpose. This
review of multiple regression analyses of trends in homicide rates suggests
that the method fails because researchers overfit their models to one data
set, then fail to test them with fresh data. The lack of progress
in regression modeling of homicide trends over several decades suggests
that the trends may be chaotic. Studies that disaggregate trends
and combine qualitative with quantitative data have been much more successful.
December 5. We went over the second midterm, working some regression
problems; the answers are online.
We reviewed a WEB
site on path analysis, noting the statement that from Everitt
and Dunn (1991): "However convincing, respectable and reasonable a path
diagram... may appear, any causal inferences extracted are rarely more
than a form of statistical fantasy". We looked at some examples of
trend
graphs from the paper on Myths of Murder and Multiple Regression that
provide more reliable information about causality.
Dec 7 - an experiment is not just trying something out, it is a specific research design. Two key characteristics: before and after measurement, and manipulation of the Independent Variable. We actually do something to people and observe the consequences. This differs from observing behavior in a natural setting OR asking people questions. This is a very rigorous way to test causal relationships. It is rigorous because we can control for extraneous factors or variables. To be rigorous, an experiment needs a control group, and the control group has to be the same as the experimental group EXCEPT for the one independent variable.
Problems with experiments:
1. You can't really experiment on a lot of variables
because you can't control them. It may unethical to do so.
2. They are artificial and you don't if the real
world would be the same.
3. The experiment itself may change things:
testing effects. Placebo effect.
4. Practical problems: people drop
out (experimental mortality), history (things go on in the outside world
that effect the experiment). E.g., field experiment on welfare reform.
Internal validity - was it done correctly.
External validity - whether the experiment can be generalized
to the outside world.
Evaluation research: prison, hospital, educational
institution. Anyplace where you are doing things to people anyway.
They key is to assign people at random, and there is often resistance to
this.
December 10 -
Content Analysis - "unobtrusive data" Data created by a bureaucratic system, e. g. police records, or often by the media. Television or Newspapers either because that is our interest, the media, or as a way of getting information, e.g., on crime reported in the news.
Similar to survey research, except that you do coding instead of interviewing. Coding means that you assign numbers to phenomena that you observe. Counting things. Each of your variables is coded from the published information.
Conceptualization.
Measurement. Reliability and Validity.
Manifest Content - what's it's about on the surface
Latent Content - things that we infer about the content,
e.g., does the writer sound angry? Indignation, sexy?
This class is 50 920 301
Sampling - which content do you look at?
You can go back in history, and your work can be checked up or replicated.
Data analysis is about the same as for survery research,
the only difference is that the unit of analysis is the story or tv show
or whatever rather than a person who was interviewed.