STA429/1007 F 2004 Handout 4: The Math Data
Regression Example One
/********************** mathreg1.sas **********************/
title2 'Illustrate multiple regression';
options linesize=79 pagesize=2000 noovp formdlim='_';
libname math 'mathlib'; /* Location of permanent SAS datasets */
libname library 'mathlib'; /* SAS will seach for permanently stored
formats ONLY in a place called "library." */
proc reg data=math.explore simple corr;
model grade = gpa hscalc totscore;
hschool: test gpa=hscalc=0;
/* Calculate predicted Y and explained remaining variation with proc
iml. Could use a calculator instead. */
proc iml;
title3 'Predicted Y and explained remaining variation with proc iml';
/* Predict Final Mark for a student with
HSGPA = 80
HSCALC = 75
Diagnostic test score = 15 */
Yhat = -73.12627 + 1.20851*80 + 0.34197*75 + 1.03078*15;
print "gpa=80 hscalc=75 totscore=15";
print "Yhat = " yhat;
print " ";
F1 = 4.08**2; /* F = t-squared*/
a1 = 1*F1/(287+1*F1); /* n-p is error df */
print "Controlling for gpa and hscalc, diagnostic test explains";
print a1;
print "... of the remaining variation in grade.";
print "";
F2 = 65.47;
a2 = 2*F2/(287+2*F2);
print "Controlling for diagnostic test, gpa and hscalc explain";
print a2;
print "... of the remaining variation in grade.";
print "";
/* Many refinements of the regression analysis are possible. Here is one. */
/* First get rid of labels so correlation matrix will look nice */
data blank;
set math.explore;
label grade=' ' gpa=' ' hscalc=' ' precalc=' ' calc=' ';
proc reg simple corr;
title3 'Calc and precalc separately';
model grade = gpa hscalc precalc calc;
hschool: test gpa=hscalc=0;
dtest: test precalc=calc=0;
compare: test precalc=calc;
proc standard data=blank mean=0 std=1 out=withz;
var gpa hscalc totscore; /* Standardize these vars */
proc reg;
title3 'Use standardized IVs';
model grade = gpa hscalc precalc calc;
gpacalc: test gpa=hscalc;
% cat mathreg1.lst
_______________________________________________________________________________
The SAS System 1
Illustrate multiple regression
22:27 Wednesday, September 29, 2004
The REG Procedure
Descriptive Statistics
Uncorrected Standard
Variable Sum Mean SS Variance Deviation
Intercept 291.00000 1.00000 291.00000 0 0
gpa 23551 80.93299 1916508 35.91325 5.99277
hscalc 22840 78.48797 1829830 128.15417 11.32052
totscore 2561.00000 8.80069 26239 12.76014 3.57213
grade 17633 60.59450 1165607 334.97983 18.30245
Descriptive Statistics
Variable Label
Intercept Intercept
gpa High School GPA
hscalc HS Calculus
totscore Total # right on diagnostic test
grade Final mark (if any)
Correlation
Variable Label gpa hscalc
gpa High School GPA 1.0000 0.6313
hscalc HS Calculus 0.6313 1.0000
totscore Total # right on diagnostic test 0.3358 0.4350
grade Final mark (if any) 0.5968 0.5489
Correlation
Variable Label totscore grade
gpa High School GPA 0.3358 0.5968
hscalc HS Calculus 0.4350 0.5489
totscore Total # right on diagnostic test 1.0000 0.4261
grade Final mark (if any) 0.4261 1.0000
_______________________________________________________________________________
The SAS System 2
Illustrate multiple regression
22:27 Wednesday, September 29, 2004
The REG Procedure
Model: MODEL1
Dependent Variable: grade Final mark (if any)
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 3 42546 14182 74.55 <.0001
Error 287 54598 190.23637
Corrected Total 290 97144
Root MSE 13.79262 R-Square 0.4380
Dependent Mean 60.59450 Adj R-Sq 0.4321
Coeff Var 22.76216
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error
Intercept Intercept 1 -73.12627 11.17387
gpa High School GPA 1 1.20851 0.17495
hscalc HS Calculus 1 0.34197 0.09688
totscore Total # right on diagnostic 1 1.03078 0.25278
test
Parameter Estimates
Variable Label DF t Value Pr > |t|
Intercept Intercept 1 -6.54 <.0001
gpa High School GPA 1 6.91 <.0001
hscalc HS Calculus 1 3.53 0.0005
totscore Total # right on diagnostic 1 4.08 <.0001
test
_______________________________________________________________________________
The SAS System 3
Illustrate multiple regression
22:27 Wednesday, September 29, 2004
The REG Procedure
Model: MODEL1
Test hschool Results for Dependent Variable grade
Mean
Source DF Square F Value Pr > F
Numerator 2 12455 65.47 <.0001
Denominator 287 190.23637
_______________________________________________________________________________
The SAS System 4
Illustrate multiple regression
Predicted Y and explained remaining variation with proc iml
22:27 Wednesday, September 29, 2004
gpa=80 hscalc=75 totscore=15
YHAT
Yhat = 64.66398
Controlling for gpa and hscalc, diagnostic test explains
A1
0.0548217
... of the remaining variation in grade.
Controlling for diagnostic test, gpa and hscalc explain
A2
0.3132986
... of the remaining variation in grade.
_______________________________________________________________________________
The SAS System 5
Illustrate multiple regression
Calc and precalc separately
22:27 Wednesday, September 29, 2004
The REG Procedure
Descriptive Statistics
Uncorrected Standard
Variable Sum Mean SS Variance Deviation
Intercept 291.00000 1.00000 291.00000 0 0
gpa 23551 80.93299 1916508 35.91325 5.99277
hscalc 22840 78.48797 1829830 128.15417 11.32052
precalc 1409.00000 4.84192 7603.00000 2.69217 1.64078
calc 1152.00000 3.95876 6406.00000 6.36381 2.52266
grade 17633 60.59450 1165607 334.97983 18.30245
Correlation
Variable gpa hscalc precalc calc grade
gpa 1.0000 0.6313 0.2824 0.2919 0.5968
hscalc 0.6313 1.0000 0.3185 0.4088 0.5489
precalc 0.2824 0.3185 1.0000 0.4475 0.3668
calc 0.2919 0.4088 0.4475 1.0000 0.3648
grade 0.5968 0.5489 0.3668 0.3648 1.0000
_______________________________________________________________________________
The SAS System 6
Illustrate multiple regression
Calc and precalc separately
22:27 Wednesday, September 29, 2004
The REG Procedure
Model: MODEL1
Dependent Variable: grade
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 4 42792 10698 56.29 <.0001
Error 286 54352 190.04077
Corrected Total 290 97144
Root MSE 13.78553 R-Square 0.4405
Dependent Mean 60.59450 Adj R-Sq 0.4327
Coeff Var 22.75046
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 -74.09971 11.20082 -6.62 <.0001
gpa 1 1.19490 0.17527 6.82 <.0001
hscalc 1 0.34928 0.09704 3.60 0.0004
precalc 1 1.60250 0.56228 2.85 0.0047
calc 1 0.71082 0.37797 1.88 0.0610
_______________________________________________________________________________
The SAS System 7
Illustrate multiple regression
Calc and precalc separately
22:27 Wednesday, September 29, 2004
The REG Procedure
Model: MODEL1
Test hschool Results for Dependent Variable grade
Mean
Source DF Square F Value Pr > F
Numerator 2 12416 65.34 <.0001
Denominator 286 190.04077
_______________________________________________________________________________
The SAS System 8
Illustrate multiple regression
Calc and precalc separately
22:27 Wednesday, September 29, 2004
The REG Procedure
Model: MODEL1
Test dtest Results for Dependent Variable grade
Mean
Source DF Square F Value Pr > F
Numerator 2 1704.71633 8.97 0.0002
Denominator 286 190.04077
_______________________________________________________________________________
The SAS System 9
Illustrate multiple regression
Calc and precalc separately
22:27 Wednesday, September 29, 2004
The REG Procedure
Model: MODEL1
Test compare Results for Dependent Variable grade
Mean
Source DF Square F Value Pr > F
Numerator 1 246.17805 1.30 0.2560
Denominator 286 190.04077
_______________________________________________________________________________
The SAS System 10
Illustrate multiple regression
Use standardized IVs
22:27 Wednesday, September 29, 2004
The REG Procedure
Model: MODEL1
Dependent Variable: grade
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 4 42792 10698 56.29 <.0001
Error 286 54352 190.04077
Corrected Total 290 97144
Root MSE 13.78553 R-Square 0.4405
Dependent Mean 60.59450 Adj R-Sq 0.4327
Coeff Var 22.75046
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 46.97005 2.62614 17.89 <.0001
gpa 1 7.21537 1.05835 6.82 <.0001
hscalc 1 4.25842 1.18314 3.60 0.0004
precalc 1 1.60250 0.56228 2.85 0.0047
calc 1 0.71082 0.37797 1.88 0.0610
_______________________________________________________________________________
The SAS System 11
Illustrate multiple regression
Use standardized IVs
22:27 Wednesday, September 29, 2004
The REG Procedure
Model: MODEL1
Test gpacalc Results for Dependent Variable grade
Mean
Source DF Square F Value Pr > F
Numerator 1 419.85843 2.21 0.1383
Denominator 286 190.04077