Class Notes for Methods and Techniques of Social Research, Spring 2004
|
|
|
See the Schedule and Assignments Page for a daily class schedule. These class notes include some material prepared to be shown on the screen in class and some notes typed in class. They are not intended to cover everything said in lecture. |
May 3:
Review for Final. The exam is 9 to 12 officially, but we
should be done by 11. Please come promptly at 9. Bring
a pencil with an eraser and a calculator. The exam will be
similar to the two midterms, just longer. It will have 96
multiple choice questions and three pages of math questions including
percentages, expected frequencies, margins of error, mean scores,
regression, path diagrams and standard deviation. I will put the
formulas for margins of error and the standard deviation on the board
as reminders.
You must complete the Percent Review and Review for Final WEBCT quizzes before the test. These count as assignments, and you can take them as often as you like
A good way to review is to go over these class notes and the review
glossaries at the end of each chapter in the textbook. Also,
several of the semester's WEBCT quizzes will be reopened May 4 and
5. This is a chance to raise your score, as well as to use the
quizzes for review.
Here are some points covered with multiple choice questions:
April 28:
Content Analysis - "unobtrusive data" Data created by a bureaucratic system, e. g. police records, or often by the media. Television or Newspapers either because that is our interest, the media, or as a way of getting information, e.g., on crime reported in the news.
Similar to survey research, except that you do coding instead of interviewing. Coding means that you assign numbers to phenomena that you observe. Counting things. Each of your variables is coded from the published information.
Conceptualization.
Measurement. Reliability and Validity.
April 26:
Experimental Research. Experimental Designs. See the graphs in the book or on Trochim's WEB site: Types of Designs.
Essential characteristics:
treatment
is compared to a placebo. These experiments are usually
"double-blind,"
to control for the psychological effects of knowing one is getting
treatment.
This is a way of controlling subject bias and experimenter bias/
the
new. This didn't work very well, there
were
errors in the group assignments and the women often forgot which group
they were in anyway
April 23:
The Review Glossary is not adequate as a guide to this chapter. Some points to be covered:
sociologist)
to get her own postage stamp, won fame through field work, primarily
her
book Coming
of Age in Samoa. Later, this book was denounced by
anthropologist
Derek Freeman in his book Margaret
Mead and the Heretic : The Making and Unmaking of an Anthropological
Myth.Anthropologists
have come to Mead's defense, and
have restudied the case, but I would have to agree with your text
that
"had Mead come back from Samoa with an accurate ethnographic report, it
would not have made her famous." Here is the NY Times Review of Freeman's
critique of Mead.
openly
done as a literary form, in other cases such as that of Rigoberta
Menchu,
it is only admitted when
critics discover it.
The
Rigoberta Menchu Controversy by Arturo Arias.
trying
to discern patterns in the family interactions that contributed to the
illness. Myra Bluebond-Langner's book The
Private Worlds of Dying Children has been very influential;
she
has just published a sequel called In
the Shadow of Illness : Parents and Siblings of the Chronically Ill
Child
Field reserch offers a richness of description and possibility of new
insights
that is unparalled by any other method. Unless it is supplemented
with other methods, it does not provide statistical data, and it is
hard
to replicate.
April 21:
Chapter 7 on Survey
Research. How Polls are
Conducted (Gallup). Questionnaire
Design. Questionnaire
Construction. Interviewing
Guidelines. - Interviewing
Techniques. Questionnaire
from alumni survey. Preliminary
Report on
2003 Alumni Survey.
April 19:
Research Design. How research is organized or structured to
accomplish
different ends.
| Purpose of Study | Preferred Design | Advantages/Disadvantages |
| Exploration - To get some new ideas,
or at least ideas that are new to you. |
1. Literature Review - library research 2. Secondary Analysis - Using data that is already being collected by a country, a government office, a company. Criminal justice systems generate a lot of data for their own purposes. You are limited to the questions someone else designed and asked. 3.Field Observation - Go into the natural setting and observe what is going on. You may talk to people and ask questions as well, but the really unique aspect is observation. 4. Focus Groups - Group interviews lasting about an hour and a half. 5. Case Studies - based on documents, interviews or sometimes observations |
1. Get insights of others. Avoid
reinventing the wheel./ Tends to repeat the past, not generate new
ideas. 2. Access a tremendous amount of information quickly and cheaply./ Limited to the questions asked by others. 3. Get new insights in natural setting/ Difficult and time consuming, small sample. Access difficult. 4. Detailed, inductive subtle understanding of patterns./ Difficult to generalize. |
| Description - To get accurate and
relatively precise information, especially about large groups or |
1. Secondary Analysis - Data banks of surveys
are available, many other kinds of data also. 2. Surveys - Questionnaires or interviews. Often on the telephone. 3. Content analysis - Looking at media as a source of data: tv shows, letters to the editor, newspaper articles. Written documents. You can go back in time. |
1. Excellent data, especially for trends
over time/ Limited to questions asked by others. 2. Ask your own questions, choose your own sample/ Time consuming of expensive. Limited to topics people can answer accurately 3. Unobtrustive, allows study of media./ Limited to topics that involve published media. |
| Explanation. To answer
questions about cause and effect. |
1. Experiment - In an experiment we manipulate
the independent variable. The independent variable is the
"cause"
. Then we measure the dependent variable or "effect" both before
and after on experimental and control group. 2. Multivariate Statistical Analysis of Survey Data |
1. Best method of proving causal
relationships./ Hard to maintain rigor of design (internal validity)
and to generalize beyond the limits of the experiment (external
validity). Serious ethical and practical limitations. 2. Can use servey and secondary data and address wide range of important topics/ Data sets must include good measures of all relevant variables and wide range of data. Not valid unless the models can be shown to predict trends in fresh data. Most useful for making predictions to be evaluated with fresh data. |
April 12: SAMPLING is used when we are
interested in studying a population that is too large for us to study
each individual. The first step is to define the
population
we wish to make statements about, e.g. adults in New Jersey, probable
voters, people convicted of felonies, graduates of our
department. We might want to study the entire population of the
USA. If we try to collect data from everyone, this is a
census. The Census Bureau does this once every decade, and misses
a lot of people. Everyone else does sampling, we select a
cross-section to represent the population. If you
try to study the whole population, you often fail to do a good job.
Gallup:
How Polls are Conducted.
Size of the sample. How big of a sample do I
need?
Size
of the sample does not depend on the size of the population.
How do we select the sample size? Decide on the
margin of error you will tolerate? Margin of error is equal to
one
divided by the square root of the sample size. Sample of
400,
the square root is 20. 1/20 = .05 or 5%. If you interviewed
400, 300 were white, 50 were black and 50 were others. For the
blacks,
with a sample of 50, we would have a 14% margin of error. For the
whites, with a sample of 300, we would have a 5.8% margin or error.
Take 300, the square root of 300 is =
17.32
1 /17.32 = .0577 * 100 = 5.8%
m = 1/sqrt(n) Solve for N: m2 = 1/n n * m2 = 1 n = 1/ m2 If we need a margin of error of 3%, or .03. n = 1/ .032
If you have a sample size and need to know the margin of error, use m = 1/sqrt(n)
If you are given a margin of error and asked how large a sample you need, use n = 1/ m2
In these
formulas
n = the size of the sample (not the population). m =
the margin of error expressed as a proportion, not as a percent.
Thus, if the questions says "we need a margin of error of 5%, then m =
.05.
If our sample is stratified, this means we really have several
sub-samples and we need the same size sample for each of them,
regardless of the size. For example, if we want sample white,
black and Hispanic respondents and make statements about each group, we
need the same size sample of both regardless of their size in the
population. Thus, if we need a margin of error of 5% for each of
the three
groups,
then the answer is 3 * (
n = 1/ m2 ).
Terms:
Margin of Error: How much a sample statistic is likely to vary
from the population parameter. We say that we are 95% sure that
the sample is not off by more than the margin of error. How this
is presented in
NY Times. "19 out of 20" is another way of saying 95%.
Confidence level: we always use a 95% confidence level.
Confidence interval: the range within which we think a
statistic would fall, e.g., if the margin of error is 3% and the sample
statistic is 67%, the confidence interval is from 64% to 70%. We
are 95% sure that the true figure is within this limit.
March 31: We will discuss path analysis and interpreting
regression models, following the textbook and the discussions in:
A Brief
Intro to Path Analysis. A longer
introduction with more examples. A more technical Intro to
Path Analysis.
For the exam, you should know how to set up the regression equations
to fit a path diagram. The rules are the follows:
alienation from government = status deficiency
alienation from society = status deficiencyMarch 29: Multiple Regression. For predicting a dependent
variable with one or more independent variables, we need both an
"unstandardized regression coefficient" and an "intercept." This
is what we did in the Excel-Regression assignment Excel referred
to them as the "Coefficients" one of which is the X-variable
coefficient, the other the intercept. This is an unstandardized
regression coefficient. In that example, we used only one
independent variable. However, we could use more than one.
See the "Regression
Analysis" in class exercise.
Each of the unstandardized regression coefficients is on a different
scale because it is designed to be multipled with the independent
variable with which it is associated. If we want to compare them,
we standardize them so they all vary from zero to one or minus one,
like correlation coefficients (they are standardized by multiplying
them by the ratio of the standard deviations of the IV and the DV, but
we don't have to worry about this since the software does it for
us). These are called "standardized regression coefficients" or
Beta Weights. They are used in Path Diagrams because they are
comparable, the larger they are (in absolute value) the stronger the
relationship between variables.
March 26: We will continue with testing causal relationships
through cross-tabulation.
Today we will look at testing causal hypotheses. On page 93 in the text, we have the example of the relationship between Height and Liking Basketball. This is anIV and a DV. An obvious TEST VARIABLE is Gender. This would be Antecedent, Gender determines both your height and liking for basketball. We could draw this as a path diagram (on board).
When we introduce the control, we split the table into two parts, e.g.,
Males
Females
Total
Tall
Short
Tall Short
Tall Short
Likes
BB
85%
85%
25% 25%
65%
45%
Does
Not
15%
15%
75% 75%
35%
55%
Total 100% 100% 100% 100% 100% 100%
In the real world, things are never this
sharp.
Let's look at some real data, using FEAR
WALK, PLACE SIZE and R.INCOME from the GSS data set:
In the total sample, the low income
respondents are more likely to feel there are areas near them where
they should fear walking. However, this effect disappears for
some of the respondents when we control for the size of the town in
which they live.
To make it a finished Table:
Small Town or rural
Small
City
City/Surb
Total
Low Med
Hi
Low Med Hi Low Med
Hi
Low Med Hi
Fear
Walk
30% 27%
24%
48 42% 20% 56 41
43 51% 39% 41%
No
Fear
70% 73%
76%
52% 58% 80% 44% 59%
57%
49% 61% 59%
p =
.710
p = .043 p
= .000 p=.000
N =
251
N =
133
N = 1253 N = 1637
To to a more complete causal model of Fear of Walking at Night, we should introduce more variables. Some of them may be in our data set, others now.
What variables should we look at?
Variables
Hypotheses
Gender
Females more fearful than males.
Age
Elderly more fearful, also Children. Might be curvilinear.
Crime Rate
People
in high crime communities
Street Lighting
Freq of Patrols
Graffiti, Broken Windows, Trash, other
indicators
of an "out of control" neighborhood
Bicycles
Number of Pedestrians
Physical Shape
Training in Self Defense
We can examine some of these variables with our
data. We may find it useful to use regression rather than
cross-tabulation.
March 24: we will go through pages 114-122 in the workbook.
March 22:
The Art and Science of Cause and Effect. (powerpoint)
Probabilistic cause, not an absolute cause, not a
cause
that is sufficient or necessary. "Cigarette smoking causes
cancer." WHat we mean is, smoking cigarettes
increases
the likelihood of getting cancer. How much?
There are multiple causes for everything. What
we
want to find out is how much each thing contributes. There are
also
causal linkages, or indirect causes. A causes B
and then B causes C.
Diagraming causal models. We put the dependent
variable
at the right. We draw arrows going into it for each causal
variable that effects it directly. Then we can
have arrows that go into the arrows, steps into the causal analysis, as
in
this sample file:
http://crab.rutgers.edu/~goertzel/homomale.htm
Criteria of Causation - how do we know that something is a cause of something else.
1. Time Order. The cause comes before
the
effect. Sometimes we sort out the time order theoretically, we
assume
that
education preceeds employment. Or we can use a
research design that involves gathering data at two points in
time.
If
you don't have measurements at two points in time, this
is shaky.
2. Correlation. The two variables vary
together.
When one is high, the other is high OR when one is low the other is
high. This gets at the degree of causation, the
higher the correlation the strong the causal relationship.
3. non-spuriousness, we want to know
that
the correlation is not cause by something else. We can test this
with an
experimental design, if feasible. Or we can use
statistical controls, which are not quite as convincing but its all you
do
in many cases.
We test for non-spuriousness by introducing controls.
Causal Models: representations of the complex causal relationships between variables. Variables have different causal roles, but this is determined by our causal our causal model, it is not inherent in the variables. One person's cause can be another's effect.
Dependent Variable - that is what we want to explain. Often these are opinions or behaviors
Independent Variable - what we use to explain
it.
Often there are traits or physical characteristics, e.g., sex or race,
almost always independent.
If you study the relationship of race on voting, for example, race would be independent and voting dependent.
Antecedent variables, things come before the
independent
variable. This helps us to deal with a causal chain.
Antecedent variable cause IV which causes the DV.
If the antecedent variable "explains" the
relationship,
we have an "explanation", we say it is "spurious".
Intervening Variables, this that are intervening,
e.g.
Race determines ideology which determines the vote.
This is an "interpretation" it tells WHY the causal
relationship exists.
Path
Models: a way of graphically expressing complex causal models.
Example: Determinants of Adult Homosexuality in White Males.
Example: The Seattle
Social Development Project.
March 12: we will meet
in the BSB 108 computer lab for help with Micrcase Professional
and Excel. This class is optional, attendance will not be
taken. You should be able to complete the Excel Regression
assignment during this class.
March 8 and 10: More on
trends and regression modelling, including multiple regression.
March 5: Linear
regression as a tool for data analysis. Online regression
applet. We will learn to do regression in Microsoft
Excel. This is on the "tools"/"data analysis" menu. You
may need to install this from the CD-rom if you have Excel on your home
computer. Regression
by Eye applet.
March 3: We
discussed time series analysis using the Historical Trends module in
the professional Microcase software. Details are on the Microcase
Trends assignment page.
March 1: Comparative
Research Using Aggregate Units, Chapter 8 in the text. This
research method uses data about social or geographic units.
Consistent
criminal justice statistics are important for evaluating CJ
policies. Thorsten
Sellin, a professor at Penn, was instrumental in getting consistent
CJ statistics established. We can find examples on the Bureau of Justice
Statistics
WEB site.
Comparative methods are particularly useful for studying change because
we can get data about trends over time. Look, for example, at
some Trend
Graphs taken from the "Historical Trends" module in the
Professional Microcase. This is available in the computer center
on the networked Windows computers (click on Statistics and Microcase
on the Windows menu, then open "Microcase Curriculum Plan 2003-2004 and
load the TrendSmp data set. Our next exercise, after the Quiz on
Workbook 8, will involve using this data set.
Some concepts:
Rate: A statistic that reduces numbers to a common
base. The base is often, but not necessarily, the total
population in an area. If we are looking at voting participation,
we might compute rates using the base of the number of adults 18 or
over. If we are trying to predict an election, we might use a
base of registered voters.
A crude birth rate is the number of births per 1,000
population. Fertility rate is the number of births per female
during her lifetime.
Time Series analysis: uses time periods as the unit of
analysis, looks at how things change over time often in one
case. A lagged time series takes into account the time it
takes for one variable to influence another, thus incarcerations in one
year might be related to crimes in the next year.
Cross-sectional analysis compares a number of cases at
one point in time.
Reliability: are statistics computed the same way in
different geographic units or different time periods. This causes
all sorts of problems - it is better to imporve statistics, but doing
so causes us to lose comparability.
Validity: do the statistics measure what we want them to
measure. Crimes reported to the policy are not a valid measure of
the amount of actual crime, especially for crimes that are often not
reported.
Case oriented vs. variable oriented. The case oriented
approach is more qualitative, although quantitative trend data can be
used. The variable oriented approach assumes that the same
variables are causally related in the same way in a large number of
cases, e.g., "capital punishment" and "homicide rates" in a number of
states or countries.
Outliers: especially in variable-oriented research, it is
important to look for exceptional cases that are very different from
the norm. These tend to cause a disproportionate impact on our
results.
Lagged
February 27: Quality of Measures -
Reliability - you get the
same thing
over and over. Consistency.
inter-rater
- two different raters get the same answer.
test-retest, if you take it twice the answers are the
same.
internal consistency - are theitems on a test
consistent.
Chronbach's alpha is a statistic that measure inter-item reliability.
Validity is it "really"
measuring
what it is supposed to measure.
Face Validity - does it look right?
Predictive or criterion validity - does it predict what we want to
predict,
some "true" measure. SAT test predicts college or law or medical
school grades.
Convergent
validity - do several measures give the same result.
Construct
validity - does the measure perform as our theory says it
should.
We use this when we have no criterion.
This is the most difficult, it is used when things are inherently
difficult to measure.
An example: a study of UFO Abduction Status.
February 25: Measurement Chapter 3 in both books
Variables are characteristics or aspects that take different values among the units of analysisbeing studied.
In a questionnaire, often each question is a variable, but if it has a lot of choices, they may each be a variable
Are you Democrat, Republican, Independent or what? One variable with three values.
Which of the following foods did you
eat last week
(check all that apply)
1. spaghetti -
2. soup
3 artichoke hearts
4. hamburger
5. chicken
This would be a series of
variables:
Spaghetti - yes or no
Soup - yes or no
Hamburger - yes or no
Some variables that are natural dichotomies, such as Gender (male or female) or Age (Child, Adult) or Opinion on an Issue (agree, disagree). Or at least we choose to think of them that way. We might think differently: an opinion could be (strongly agree, agree, undecided, disagree, strongly disagree). Or it could be, rate your opinion on a scale from 1 to 10. These
Levels of Measurement. What is our measurement really saying about the relationship between the values?
Dichotomous Measurement - Two and only two categories. Can be a natural dichotomy or a "dummy variables" - we take a complex variable and divide it into a series of dichotomous variables.
Nominal Measurement. Categories
that could be
put
in any order.
Catholic,
Protestant,
Jewish, Moslem, LDS, Buddhist, Episcopalian, Baptist
variable one, category of religion, variable two denomination.
Illnesses: adjustment disorder, borderline
personality
disorder, paranoid schizophrenic
Crimes: burglary, assault,
Each individual should go into one
and
only one category on a variable, one value on a variable.
For example: What is your favorite food, we have a long list, but
each person is allowed only one.
Sorting
people
into categories must be reliable and accurate or valid.
Ordinal Measurement. Here we have categories in a logical order. Very short, short, medium, very tall, tall . Often we take continuous variables and make them ordinal. Income: Under $20,000 $20 to 40,000 $40 to 60,000 $60000 plus.
Interval Measurement: TEMPERATURE IN FAHRENHEIT OR CENTIGRADE, 0 degrees is not the absence of heat. How about the day that the "temperature doubled" in New York City?
Ratio Measurement:
Income in
dollars:
a continous numerical value PLUS a meaningful zero point. Height
in inches.
Scaling is when we use a number of measures,
such as
test scores or questionnaire items, to measure a more general
concept. We can do this by adding them up (in which case your
text would call it an "index", although many people still use the form
scale) , or they may be ordered from lowest to highest (in which case
it is a true scale as the term is used in your book). Your test
is an example. I just add up the points, to measure the general
variable "knowledge of research methods as covered in the first part of
the course." Another approach would be to rank the items from
easy to hard and see which you could do. This is tricky, because
some people can do the hard ones and not the easy ones. When we
make an index or scale, we get measures that can be treated as
interval, even if they are not strictly interval. Scaling methods
can be more precise, but these are not used much in sociology or
CJ. For example, we could scale the seriousness
of crimes. There are various methods of
measuring this. - paired comparisons means asking a sample of
people to rate crimes based on their perceived seriousness.
February 11:
Today we will begin with Amar
Patel's Chi-Square lesson. This covers the concept of
expected frequencies and observed frequencies, and introduces the
concept of "fairness", the difference statistic and the chisquare
statistic. These are applied to problems where the expected
frequencies are given by a null hypothesis of "fairness".
We can apply this to any distribution where we have a theoretical
reason to expect a certain result. E.g., with two dice, each with
six sides. What results are possible and what likelihood do we
have?
| Total |
Expected |
Observed |
| 2 |
1 |
|
| 3 |
2 |
|
| 4 |
3 |
|
| 5 |
4 |
|
| 6 |
5 |
|
| 7 |
6 |
|
| 8 |
5 |
|
| 9 |
4 |
|
| 10 |
3 |
|
| 11 |
2 |
|
| 12 |
1 |
We will then apply the same statistic to crosstabulations where the
expected frequencies are determined by the marginal frequencies.
Last class we worked with observed frequencies, row percent,
column
percent, and total percent. Today we will compute expected
frequencies for each cell in a
cross-tabulation
table, and show how the difference statistic and chisquare statistic
are computed.
We will use a simple 2 by 2 distribution as follows. The
variables are gender and opinion on an issue, each of which has two
values:
25 men agreed
17 men disagreed
65 women agreed
30 women disagreed
| Observed Frequencies or Obtained Frequencies | Men | Women | total |
| Agree | 25 | 65 | 90 |
| disagree | 17 | 30 |
47 |
| total | 42 | 95 |
137 |
We can compute expected frequencies, based on the null hypothesis that men and women do not differ intheir opinions. We can compute these knowing only the marginal or total frequencies. The easy way to compute them is to multiple the row total for each cell by the column total for that cell, then divide by the grand total. Another way would be to convert the row totals to proportions, then multiply then by the column totals. Expected Frequencies - rt *ct /gt
| Expected Frequencies | men | women | total |
| agree | 90*42/137=27.59 | 90*95/137=62.41 | 90 |
| disagree | 47*42/137=14.41 | 47*95/137=32.59 | 47 |
| total | 42 | 95 |
137 |
What would we get if we used the expected frequencies to make
acolumn percentage table? The percentages would be the same in
each column (except for rounding error). That is the point of
expected frequencies, they are frequencies we would get if all
the columns were the same on percentage term.
| Percents Computed from Expected Frequencies |
Men |
Women |
Total |
| Agree |
65.7% |
65.7% |
65.7% |
| Disagree |
34.3% |
34.3% |
34.3% |
| Total |
100% |
100% |
100% |
We can use the expected frequencies to compute the "difference
statistic" as described by Patel. This tells us how much each
cell is off from what was expected. As you can see, each cell is
off by 2.59, in either the positive or the negative
direction. This is a rough measure of how much our
observations differ from the expected, plus or minus 2.59, but it is
not widely used. The sum of the differences is zero because the
negatives cancel out the positives.
The statistic that is used is the chi-square statistic. This
is
designed to give more weight to bigger differences and to make all
differences positive so they can be added up to a number that can be
used for probability testing. We have probability distributions
for chi-square, which enables us to tell the likelihood that the
difference could have appeared by chance. Chisquare is
computed by squaring the differences between the observed (Fo) and
expected (Fe) for each cell, then dividing them by the expected for
that cell, then adding them up.
To get the chi square, we add up the computations
for each cell = .2431+.1075+.4655+.2058 = 1.0229.
Programs such as Microcase compute this for us. We can also
get the chi square typing the observed frequencies and into the WEB
chisquare calculator (using the version without the "Yates
correction"). The result is 1.023. The computer this tells
us that the result is not "statistically significant" by chi-square
test. In the days before computers, we looked these up in a table
in the back of a statistics book.
To see these tables, open the EXCEL
2 by 2 chi-square calculator I have prepared. It has all the
tables: observed frequencies, row percents, column percent, total
percent, difference statistic, chi square. In this spreadsheet,
if we change
the numbers in observed frequencies table, the other numbers will
change accordingly.
February 9: Survey data are largely nominal or categorical, which means that there are two or three distinct answers, rather than continuous. Continuous variables vary on a scale with a large number of values. This includes things such as height and weight if measured in inches or pounds, or rates of all kinds, e.g., crime rate, divorce rate, birth rate. Votes can be continuous, e.g., if you say 56% voted for Kerry, 26% for Dean, etc. However, if you ask how an individual voted, there is a distinct set of categories: Kerry, Dean, Kucinich, etc. A continuous variable can be collapsed into categories. A categorical variable can also be converted to continuous when you are talking about a large population,e.g, the categorical votes of a number of individuals can be converted to percents voting for each. So to work with survey data we need to understand "per cent" and the different ways of computing and using them. "Cent" means 100. Per cent is a ratio, with the denominator being 100. A rate. We have other rates, such as per 1000 or per 100000 or even per million or per billion.
The problem with percents is knowing the base, and how it adds to 100. What are the other components of the total.
men 55 agreed
women 33 agreee
men 27 disagreed
women 42 disagreed
The first thing we do is put these into a contingency or
cross-tabulation table. We usually put the Independent or
(causal) variable in the column and the dependent variable in the row
. It is best not to have too many categories on either variable,
unless you have a very large number of cases. This is the
smallest possible table, a 2 by 2 table.
| Observed Frequencies | men | women | Total |
| Agree | 55 | 33 | 88 |
| Disagree | 27 | 42 | 69 |
| Total | 82 | 75 | 157 |
There are three ways to do the percents.
In the row percent, the total is the number in the row which is used as the base.1. What percent of the men agreed?
2. What percent of the women disagreed?
3. What percent of those who agreed were men?
4. What percent of those who disagreed were women?
5. What percent of the respondents agreed?
6. What percent of the respondents were women?
Here is the kind of table we would put in a report. It gives the
column percents because the column variable is the Independent
Variable. For most purposes, the percents are based on the
Independent Variable:
| Column Percents | Men | Women | Total |
| Agree | 67.1% | 39.1% | 52.2% |
| disagree | 37.5% | 60.9% | 47.8% |
| Total | 100% | 100% | 100% |
February 4:
Discussion of designing research
projects. How do we decide what to study? Supplementary
reading
in Trochim on the
structure of research. You may prefer his "hourglass"
metaphor
to the circular one on page 14 of our textbook.
February 2 :Today, we will look at the use of scatterplots.
This is a two (or three) dimensionial plot
of the
relationship between continuous variables. Height and weight are
an
example, as we can see of thr plot of
heights and
weights from a previous class. There are also summary statistics
and a
regression equation.
![]() |
The Line Equation gives us a formula for plotting the straight line that best fits the points. The r= is a measure how closely the points fit a straight line. If it has asterisks, the relationship is “statistically significant” which means that it is strong enough that it probably is not just due to random change. The Prob = gives us the probability that the relationship occurred by chance, if it is less than .05, often given as p < .05, we say it is “significant”. It may not be meaningful, just something other than random chance. In this case, we might say that “height” is the independent variable or cause because it is more reasonable to say that one’s height determines one’s weight than the other way around. To predict someone’s weight from their height, we use the line equation. We multiply the height (in inches) by 6.596 and subtract 291.542. This us what the person would weigh if they were of average weight for people of their height in our class.
We can use an online 2D scatterplot program. to generate examples. There is also a program for 3-d scatterplots. When you have more than three variables, you can still do the mathematics, but diagrams don’t make much sense (sometimes color-coding is used for a variable).
Sometimes variables are not related in a “linear”
fashion, which
means that regression doesn’t make much sense. An example is
“Anscombe’s Quartet”
These are four scatterplots
that are all fitted by the same linear regression equation.
![]() |
Using linear regression when the data do not fit a
regression line
can lead
to problems in research. Here, for example, is a scatterplot of executions and homicide rates in
US States;
![]() |
January 30 - How does social science differ from other
ways of
thinking: poetry, philosophy, theology, physical science?
How would
we divide up fields of study? Physical Science,
Social Science, Humanities? Science, Art and Morality? Or,
in
Greek, Episteme, Techne, Phronesis: Three
approaches
to knowledge. At Rutgers Camden we divide knowledge up
differently: Rutgers
Camden requirements. How does social science differ from the
other
categories? We begin with concepts, as we discussed in the
last
class, but so do other fields especially
philosophy
and even mathematics if we recognize that numbers are concepts.
The small
integers are especially important, especially Zero and One (or nothing
and something). Religion may also
start with
comments The Bible says In the beginning
there was
the Word, and the Word was with God, and the Word was God. What does that mean? As
a
theologian. Religious concepts are good if they provoke spiritual
reflection, as in reciting a Mantra in Buddhism. Literary
concepts are
good if they are beautiful, which social sciences seldom
are. W.H. Auden's poem Under Which Lyre
is an aesthetic attack on social
science and other
applied sciences.
In Social Science, a concept is good if it helps us to understand
empirical reality. A good concept leads to useful generalizations
or
theories. Theories are general statements about relationships
between
concepts that reflect how people think and behave. It can also be
operationialized which means finding
indicators
to measure it. A very common way of operationalizing
a concept is to write a survey question. Others may be operationalized by observation or by physical
measurement
or by counting things. In criminal justice, concepts are often operationalized by having police officers fill
out reports
on incidents. We can find a good list of sociological concepts by
going
to survey research archives, where concepts are translated into survey
questions. Check the General
Social Survey and the
Eagleton poll.Criminal
justice concepts can be found on the Bureau of Justice
Statistics
WEB site.
There are also bad concepts. For an example of one I
think
is bad, click on virtropy. What's wrong with this
concept?
Recently there has been some controversy over "race" as a
concept. Some people say races do not "really" exist.
Biologically, that is true if by "exist" you mean that people fall
into distinct categories. Physical differences exist with regard
to skin
color and other traits, but they are distributed continuously, not in
distinct
categories. Sociologically, racial differences exist and are
important. The people who say they do not "exist" are usually
in favor of using them for affirmative action programs, or even for
reparations, so they concede that they have sociological meaning.
That
meaning differs from society to society, and may change over
time. The
growth of the Hispanic population in the
Other concepts we can consider are: poverty,
power, crime, murder, race, IQ, liberalism/conservatism, homelessness.
Or we
could look at Personality
Types as
defined by Carl Jung and Measured by Isabel Meyers-Briggs.
January
28: We will look at the material on Rutgers Policy, Informed
Consent and
Behavioral Research in the Human Subjects Certification Course.
You can
find the same material in WEBCT.
Categories of Exemption include most survey or interviewing research unless it involves confidential information or people can be identified in the data set. Research using previously existing data is also exempt. These actually are most of the things that we do in the social sciences. Most "behavioral" research is exempt from review. However, you still have to fill out a form and state why your research is exempt.
We will also look at some cases that have raised ethical issues:
There has been a raging controversy about the book Darkness in El Dorado about research on the Yanomamo in Venezuela is the latest ethical controversy, which also raises important methodological questions. The allegation is that researchers gave the tribe measles, but some people argue that any contact with isolated tribes violates their rights. Many of the book's allegations, however, have been contested by the National Academy of Sciences.
A
controversial book is Laud Humphrey's Tea
Room Trade, which raises ethical issues. He studied gay sex in a
men's room
in a park in
Concepts
and theories. Concepts are words, or the meaning behind the
word.
Mother or madre, same concept.
"surrogate
mother" "birth mother" "adoptive mother"
What is the difference? "gender" (social) "sex"
(biological) Race?
A
theory
is when you make statements about how concepts fit together, the
relationship
between them. Two kinds of relationships:
logical or tautological - true by definition
falls into philosophy, metaphysics
empirical or testable - true by observation - this is what we are
interested
in.
authority or tradition or religion - fall into a religious or political
category
January 26: we went through parts 1, 2 and 3 on the Table of
Contents of
the "Course Content" of the Human
Subjects Certification Program WEBCT course. You can find the
same
material in WEBCT.