Regression Analysis

 
Name ..................................

 This exercise uses some data on weight, height, age and sex in a small group (N=22) to illustrate the use of regression analysis.

Weight is our dependent variable.  It is measured on an interval scale in pounds.
Height is an independent variable.  It is measured on an interval scale in inches.
Age is an independent variable.  It is measured on an interval scale in years.
Sex is an independent variable.  It is measured as a dummy variable where 0 = male  and 1 = female.

We can begin with a bivariate analysis, with weight as the dependent variable and height as the independent variable.  We can see this graphically by using the Scatterplot procedure. 
This procedure also gives us the regression equation, Y = a + b X, where Y is Weight, X is Height, a is the intercept and b is the (undstandardized) regression coefficient.   To predict someone's weight we take their height, multiply it by 6.596, and subtract 291.542.  In a bivariate regression, the "r" is equal to the "beta" which is .796.

Question:  what weight would we predict for a person who was 72 inches high?

                What would we predict for a person who was 60 inches high?

                 Is this relationship statistically significant?

               Fill in the blanks:    Weight =                       +                         *   Height.
 

If we use the Regression procedure, the diagram looks like this.  Multiple R-Squared tells us how much of the variance we have explained, 63.4%.  In this bivariate regression, the BETA is the same as the correlation coefficient (r).


Clicking on the ANOVA button gives us the following information, with the key facts highlighted.  Note that the Y-intercept and the unstandardized BETA (Unstand. b) are the numbers that went into the "Line Equation" on the diagram.

Analysis of Variance
Dependent Variable: weight
N: 22          Missing: 0
Multiple R-Square = 0.634    Y-Intercept = -291.542
Standard error of the estimate = 21.052
LISTWISE deletion (1-tailed test)     Significance Levels: **=.01, *=.05
 Source Sum of Squares DF Mean Square F Prob.
 REGRESSION 15356.106 1 15356.106 34.649 0.000
 RESIDUAL 8863.713 20 443.186
 TOTAL 24219.818 21
    Unstand.b    Stand.Beta   Std.Err.b     t
 Height 6.596    0.796       1.121         5.886 **

It is important to understand the difference between "unstanardized beta" - also called the unstandardized regression coefficient, and standardized beta, sometimes just called BETA.  The unstandardized coefficients are used for actually making a prediction, using the independent variables as they were measured.  For example, if a variable is in dollars, the unstandardized coefficient is in dollars, if a variable is in inches, it is in inches.   The standardized beta is used to compare the strength of different independent variables measured in different ways.  They vary from -1 to 0 to +1 and are similar to correlation coefficients.  See page 105 in our text on this.

If we introduce a second independent variable, we can no longer use Scatterplot since it does not do three-dimensional graphs.  We can, however, use the Regression procedure.  Using sex and height as independent variables, we get the following result.  Multiple R-Squared tells us we have explained 71% of the variance.

This diagram gives us BETA coefficients (or standardized regression coefficients).  These measure how well each independent variable predicts the dependent variable when the other independent variable is held constant.  As you can see, the BETA coefficients are smaller than the bivariate correlation coefficients (r).  This is because height and sex are correlated.  Nevertheless, each is significantly related to weight when the other is controlled.  Height is slightly stronger as a predictor.

If we click on the ANOVA button on the Regression procedure, we can get the undstandardized regression coefficients and the intercept.  It looks better on the microcase screen than it does when I copy it here, but the information is here.  I have bolded the key information.

Analysis of Variance
Dependent Variable: weight
N: 22          Missing: 0
Multiple R-Square = 0.713    Y-Intercept = -101.738
Standard error of the estimate = 19.126
LISTWISE deletion (1-tailed test)     Significance Levels: **=.01, *=.05
 Source Sum of Squares DF Mean Square F Prob.
 REGRESSION 17269.658 2 8634.829 23.605 0.000
 RESIDUAL 6950.160 19 365.798
 TOTAL 24219.818 21
 Unstand.b    Stand.Beta Std.Err.b     t
Height 4.035  0.487      1.513         2.666 *
sex  -28.819 -0.418      12.600       -2.287 *

To estimate someone's weight with this regression analyhsis:   Subtrace the Intercept, -101.738.  Then take their height in inches and multiply it by 4.035.  Add the result to the intercept.  Then multiply their dummy score on the sex variable by -28.819 and add (or subtract, actually, since it is negative) that to the result.  If they are male, the dummy code is 0 so the result is 0.  If they are female, the dummy code is 1, so the result is -28.819.

This tells us that, on average, people are 4 pounds heavier for each inch of weight, and women are 28.8 pounds lighter than men.

Questions:

What is the predicted weight for a woman who is 70 inches tall?

What is the predicted weight for a man who is 72 inches tall?

What is the predicted weight for a woman who is 60 inches tall?

What is the predicted weight for a man who is 68 inches tall?

Which variable is the best predictor of the dependent variable?

What percentage of the variance in the dependent variable is explained in this analysis?

On average, how much more do men weigh than women?

On average, how much do people weigh for each inch in height?

Fill in the blanks in the formula  weight =                +  (       * Height)  + (             * (dummy score on the sex variable)).
 

If we add age as a variable, we get the following:
 

Clicking on the ANOVA button gives us the following information:

Analysis of Variance
Dependent Variable: weight
N: 22          Missing: 0
Multiple R-Square = 0.749     Y-Intercept = -74.697
Standard error of the estimate = 18.372
LISTWISE deletion (1-tailed test)     Significance Levels: **=.01, *=.05
 Source Sum of Squares DF Mean Square F Prob.
 REGRESSION 18144.329 3 6048.110 17.919 0.000
 RESIDUAL 6075.489 18 337.527
 TOTAL 24219.818 21
        Unstand.b    Stand.Beta      Std.Err.b       t
Height  3.197        0.386           1.544          2.071
gender -34.006      -0.493          12.525          -2.715 *
age     1.398        0.202          0.869            1.610

Answer the following questions:

What are the independent variables in this last analysis?
 

What is the dependent variable in this analysis?
 

Which variable is the best predictor of the dependent variable?
 

What percentage of the variance in the dependent variable is explained in this analysis?
 

Fill in the blanks in this formula:   Weight =               +   (            * Height) + (          * Gender Dummy)  + (             * Age).
 

What would the predicted weight be for a male who is 72 inches tall and 23 years old?

What would the predicted weight be for a female who is 70 inches tall and 20 years old?

What would the predicted weight be for a man who is 80 inches tall?

On average, how much heavier are men than women?

On average, how much weight do people gain for each year of age?

On average, how much weight do people gain for each inch of height?