StatisticsOverview

Statistics Overview

There are two different kinds of statistics: descriptive and inferential. Descriptive are for describing sample data, inferential are for generalizing from the sample to a population. When a statistic gives you a margin of error or a p value, it is inferential. Otherwise, the statistics we have confevered are descriptive. There are several kinds of descriptive statistics because they measure different things and because they are used for data measured on different levels. The chart below explains this:

Statistic	Purpose	Level of Measurement	How toCompute
Mean	Describes central tendency	interval	Add up the cases and divide by N.
Median	Describes central tendency	ordinal	Put the cases in order and select the one in the middle.
Mode	Describes central tendency (most frequent case)	nominal	Just choose the most frequent case
Range	Describes dispersion	ordinal	Put data in order and measure the distance from the lowest to the highest
Standard Deviation	Describes dispersion	interval	For each case, compute the distance between that case and the mean, square the differences, add them up, divide by n-1, and take the square root. See the detailed description on the Descriptive Statistics handout.
Inter-quartile range	Describes dispersion	ordinal	Put data in order and measure the distance from the 25^thpercentile to the 75^thpercentile
Margin of Error for a percent	Inferential: tells how much a sample percent is likely to differ from the true population value	nominal, typically used for questionnaire items, e.g. 55% for Candidate A, 40% for B, etc.	m = 1/sqrt(n) where n is the size of the sample This is described at more length in the class notes.
Margin of Error for a Mean score	Inferential: tells how much a mean score from a sample	interval because that is needed to get the mean. Used for data such as income, test scores, age, etc.	M = 2 * sd / SQRT(N), where sd is the standard deviation and N is the sample size, sqrt means take the square root. This is described at more length in the class notes.
Sample Size	the size of a sample needed to obtain a given margin of error	nominal	n = 1/ m²where m is the desired margin of error expressed as a proportion (not as a percent), e.g, .05, not 5%. This is described at more length in the class notes.
Observed Frequency	The number of cases observed in a particular category	Nominal data. It can be bivariate, e.g,. 35 men and 55 women, or bivariate, e.g, 15 Tall Men, 25 Short Men, 10 Tall Women, etc.	Tabulate the data, either by hand or with a computer. Microcase or other statistical packages will do this. To do it in Excel, create a "pivot table." Usually you will be given the observed frequencies.
Row Percent	Describes the frequency in a cell as a percent of the row total	two nominal variables in a cross tabulation	observed frequency/row total - all the row percents in a row will add to 100%
Column Percent	Describes the frequency in a cell as a percent of the column total	two nominal variables in a cross tabulation	observed frequency/column total - all the column percents in a column will add to 100%
Total Percent	Describes the frequency in a cell as a percent of the column total	two nominal variables in a cross tabulation	observed frequency/grand total. - all the total percents in the entire table will add to 100%
Expected Frequency	Used to compute chi-square; tells what frequency we would expect if there were no relationship between two variables	two nominal variables in a cross tabulation table	For each cell, you take the row total for the row it is in, multiply it by the column total for the column it is in, then divide by the grand total. These are frequencies, not percents, so do not give them a percent sign. If you add them all up, they add to the same number as the observed frequencies
chi-square	Inferential: tells if the relationship between two variables in a cross-tabulation is "statistically significant	A cross-tabulation of two nominal variables (or it can be used to compare one variable to theoretical values)	Use the WEB chi-square calculator or get it from Microcase. (To get it by hand you subtract each observed frequency from the corresponding expected frequency, square the difference, divide by the expected, then add them up.)
ANOVA or Analysis of Variance	Inferential: Usually used to compare scores of two or more groups (e.g., experimental and control groups) on a variable measured on an interval scale.	One nominal variable (often groups of respondents) and one interval variable	Microcase or other software will compute this for you, it can be used with the GSS data set in student Microcase,e.g, to compare groups such as religions or racial groups on continuous variables.
correlation coefficient	Describes the strength of the relationship between two interval variables	interval, but can also be used with ordinal variables if they are a reasonable approximation to interval.	Microcase or Excel will calculate it for you. It varies from -1 to 0 to +1. Squaring it tells you the percentage of variance the equation explains.
Cramer's V	Describes how well one variable in a cross-tabulation can explain the other.	two nominal variables in a cross-tabulation table	Microcase computes it. It is derived from the chi-square, dividing it by N.
Multiple R²	Describes how well all the variables in a multiple regression equation explain the variance in the dependent variable	interval or dichotomous	Microcase or Excel will compute it for you.
regression equation	Describes the line that best fits the relationship between two variables. Can be used to predict the dependent variable with the independent variable, if the relationship is close to linear.	interval (or dichotomous)	Normally, you will get the formula from Microcase or Excel or it will be given to you. It will be of the form Y = a + b X, where X is the independent variable, Y is the dependent variable, b is the regression coefficient and a is the intercept. a and b will be numbers (parameters) that define the equation for a particular case. To make a prediction, multiply the value for X by b and add a (or subtract if it is negative).
beta coefficient	Describes how good a predictor each of the independent variables in a multiple regression equation is.	interval (or dichotomous)	These are standardized regression coefficients. Microcase or Excel will compute them for you. They are used to describe the strength of each arrow on a path diagram. Note: for bivariate regressions the correlation coefficient is the same as the beta coefficient.
path diagram	Describes how a number of regression equations can be used to describe a pattern of causal relationships.	interval or dichotomous	Draw a diagram with the dependent variable on the right and the independent variables on the left. Insert the antecedent and intervening variables. Draw arrows going from left to right to represent each hypothesized link. Compute a regression equation for each variable that has an arrow going into it. Each arrow represents an independent variable to be includes. This is explained in Introduction to Path Analysis.