Statistics Overview

There are two different kinds of statistics:  descriptive and inferential.   Descriptive are for describing sample data, inferential are for generalizing from the sample to a population.  When a statistic gives you a margin of error or a p value, it is inferential.  Otherwise, the statistics we have confevered are descriptive.  There are several kinds of descriptive statistics because they measure different things and because they are used for data measured on different levels.  The chart below explains this:

Statistic
Purpose
Level of
Measurement
How toCompute
Mean
Describes central tendency
interval
Add up the cases and divide by N.
Median
Describes central tendency
ordinal
Put the cases in order and select
the one in the middle.
Mode
Describes central tendency (most frequent case)
nominal
Just choose the most frequent case
Range
Describes dispersion
ordinal
Put data in order and measure the
distance from the lowest to the highest
Standard Deviation
Describes dispersion
interval
For each case, compute the distance between that case and the mean, square the differences, add them up, divide by n-1, and take the square root.  See the detailed description on the Descriptive Statistics handout.
Inter-quartile range
Describes dispersion
ordinal
Put data in order and measure the distance from the 25th percentile to the 75th percentile
Margin of Error for a percent
Inferential:  tells how much a sample percent is likely to differ from the true population value nominal, typically used for questionnaire items, e.g. 55% for Candidate A, 40% for B, etc.
m = 1/sqrt(n) where n is the size of the sample
This is described at more length in the class notes. 
Margin of Error for a Mean score
Inferential:  tells how much a mean score from a sample
interval because that is needed to get the mean.  Used for data such as income, test scores, age, etc.
M = 2 * sd / SQRT(N), where sd is the standard deviation and N is the sample size, sqrt means take the square root.  This is described at more length in the class notes. 
Sample Size
the size of a sample needed to obtain a given margin of error
nominal
n = 1/ m2 where m is the desired margin of error expressed as a proportion (not as a percent), e.g, .05, not 5%.  This is described at more length in the class notes. 
Observed Frequency
The number of cases observed in a particular category
Nominal data.  It can be bivariate, e.g,. 35 men and 55 women, or bivariate, e.g, 15 Tall Men, 25 Short Men, 10 Tall Women, etc.
Tabulate the data, either by hand or with a computer.  Microcase or other statistical packages will do this.  To do it in Excel, create a "pivot table."  Usually you will be given the observed frequencies.
Row Percent
Describes the frequency in a cell as a percent of the row total
two nominal variables in a cross tabulation
observed frequency/row total - all the row percents in a row will add to 100%
Column Percent
Describes the frequency in a cell as a percent of the column total two nominal variables in a cross tabulation observed frequency/column total - all the column percents in a column will add to 100%
Total Percent
Describes the frequency in a cell as a percent of the column total two nominal variables in a cross tabulation observed frequency/grand total. - all the total percents in the entire table will add to 100%
Expected Frequency
Used to compute chi-square; tells what frequency we would expect if there were no relationship between two variables
two nominal variables in a cross tabulation table
For each cell, you take the row total for the row it is in, multiply it by the column total for the column it is in, then divide by the grand total.  These are frequencies, not percents, so do not give them a percent sign.  If you add them all up, they add to the same number as the observed frequencies
chi-square
Inferential:  tells if the relationship between two variables in a cross-tabulation is "statistically significant
A cross-tabulation of two nominal variables (or it can be used to compare one variable to theoretical values)
Use the WEB chi-square calculator or get it from Microcase.  (To get it by hand you subtract each observed frequency from the corresponding expected frequency, square the difference, divide by the expected, then add them up.)
ANOVA or Analysis of Variance
Inferential:  Usually used to compare scores of two or more groups (e.g., experimental and control groups) on a variable measured on an interval scale.
One nominal variable (often groups of respondents) and one interval variable
Microcase or other software will compute this for you, it can be used with the GSS data set in student Microcase,e.g, to compare groups such as religions or racial groups on continuous variables.
correlation coefficient
Describes the strength of the relationship between two interval variables
interval, but can also be used with ordinal variables if they are a reasonable approximation to interval.
Microcase or Excel will calculate it for you.  It varies from -1 to 0 to +1.  Squaring it tells you the percentage of variance the equation explains.
Cramer's V
Describes how well one variable in a cross-tabulation can explain the other.
two nominal variables in a cross-tabulation table
Microcase computes it.  It is derived from the chi-square, dividing it by N.
Multiple R2
Describes how well all the variables in a multiple regression equation explain the variance in the dependent variable
interval or dichotomous
Microcase or Excel will compute it for you.
regression equation
Describes the line that best fits the relationship between two variables.  Can be used to predict the dependent variable with the independent variable, if the relationship is close to linear.
interval (or dichotomous)
Normally, you will get the formula from Microcase or Excel or it will be given to you.  It will be of the form Y =  a + b X, where X is the independent variable, Y is the dependent variable, b is the regression coefficient and a is the intercept.  a and b will be numbers (parameters) that define the equation for a particular case.  To make a prediction, multiply the value for X by b and add a (or subtract if it is negative).
beta coefficient
Describes how good a predictor each of the independent variables in a multiple regression equation is.
interval (or dichotomous)
These are standardized regression coefficients.  Microcase or Excel will compute them for you.  They are used to describe the strength of each arrow on a path diagram.  Note:  for bivariate regressions the correlation coefficient is the same as the beta coefficient.
path diagram
Describes how a number of regression equations can be used to describe a pattern of causal relationships.
interval or dichotomous
Draw a diagram with the dependent variable on the right and the independent variables on the left.  Insert the antecedent and intervening variables.  Draw arrows going from left to right to represent each hypothesized link.  Compute a regression equation for each variable that has an arrow going into it.  Each arrow represents an independent variable to be includes.  This is explained in Introduction to Path Analysis