Logistic regression: polynomial fits

Figure 10.12 in SM plots the data, here’s a try at reproducing the right figure, along with linear, quadratic and cubic regression fits:

toxo.glm1 <- glm(cbind(r,m-r) ~ rain, family = binomial, data = toxo)
toxo.glm2 <- glm(cbind(r,m-r) ~ rain + I(rain^2), family = binomial, data = toxo)
toxo.glm3 <- glm(cbind(r,m-r) ~ rain + I(rain^2) + I(rain^3), family = binomial, data = toxo)
with(toxo, plot(rain, r/m, pch=16, xlab = "Rainfall (mm)", ylab = "proportion positive"))
with(toxo, lines(rain,toxo.glm1$fitted.values))
with(toxo, lines(rain,toxo.glm2$fitted.values, lty=2, col="red"))
with(toxo, lines(rain,toxo.glm3$fitted.values, lty=3, col="blue"))

The linear and quadratic fits are nearly identical:

summary(toxo.glm3)
## 
## Call:
## glm(formula = cbind(r, m - r) ~ rain + I(rain^2) + I(rain^3), 
##     family = binomial, data = toxo)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.762  -1.217  -0.508   0.354   2.620  
## 
## Coefficients:
##                    Estimate      Std. Error z value Pr(>|z|)    
## (Intercept) -290.1742537061   87.2177635645   -3.33  0.00088 ***
## rain           0.4499754025    0.1346997655    3.34  0.00084 ***
## I(rain^2)     -0.0002311282    0.0000690301   -3.35  0.00081 ***
## I(rain^3)      0.0000000393    0.0000000117    3.35  0.00081 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 74.212  on 33  degrees of freedom
## Residual deviance: 62.635  on 30  degrees of freedom
## AIC: 161.3
## 
## Number of Fisher Scoring iterations: 3

These coefficients are hard to read – one of Cox & Donelly’s rules of thumb is that estimates should be on a comfortable scale.

succ <- toxo$r
failure <- toxo$m - toxo$r
rainfall <- toxo$rain/100
summary(glm(cbind(succ,failure) ~ rainfall + I(rainfall^2) + I(rainfall^3), family = binomial))
## 
## Call:
## glm(formula = cbind(succ, failure) ~ rainfall + I(rainfall^2) + 
##     I(rainfall^3), family = binomial)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.762  -1.217  -0.508   0.354   2.620  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -290.1743    87.2178   -3.33  0.00088 ***
## rainfall        44.9975    13.4700    3.34  0.00084 ***
## I(rainfall^2)   -2.3113     0.6903   -3.35  0.00081 ***
## I(rainfall^3)    0.0393     0.0117    3.35  0.00081 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 74.212  on 33  degrees of freedom
## Residual deviance: 62.635  on 30  degrees of freedom
## AIC: 161.3
## 
## Number of Fisher Scoring iterations: 3

Very strange that all the \(z\)-values are the same! I think this might be because the correlation between rainfall and rainfall\(^2\) and rainfall\(^3\) is really high:

cor(rainfall,rainfall^2); cor(rainfall,rainfall^3)
## [1] 0.9987
## [1] 0.9951

This often happens with observational data. The function poly will construct orthogonal polynomials so the estimates become uncorrelated. You can often do almost as well by centering the \(x\) variable first.

toxo$rainc <- toxo$rain/100 - mean(toxo$rain/100)
cor(toxo$rainc,toxo$rainc^2); cor(toxo$rainc,toxo$rainc^3)
## [1] 0.4186
## [1] 0.872
toxo.glm4 <- glm(cbind(r,m-r) ~ rainc + I(rainc^2) + I(rainc^3), family = quasibinomial, data = toxo)

toxo.glm5 <- glm(cbind(r,m-r) ~ poly(rain/100, 3), family = quasibinomial, data = toxo)
summary(toxo.glm4)
## 
## Call:
## glm(formula = cbind(r, m - r) ~ rainc + I(rainc^2) + I(rainc^3), 
##     family = quasibinomial, data = toxo)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.762  -1.217  -0.508   0.354   2.620  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   0.0994     0.1420    0.70    0.489  
## rainc        -0.2552     0.1230   -2.08    0.047 *
## I(rainc^2)   -0.0606     0.0413   -1.47    0.152  
## I(rainc^3)    0.0393     0.0163    2.41    0.023 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasibinomial family taken to be 1.94)
## 
##     Null deviance: 74.212  on 33  degrees of freedom
## Residual deviance: 62.635  on 30  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 3
summary(toxo.glm5)
## 
## Call:
## glm(formula = cbind(r, m - r) ~ poly(rain/100, 3), family = quasibinomial, 
##     data = toxo)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.762  -1.217  -0.508   0.354   2.620  
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)  
## (Intercept)          0.0243     0.1072    0.23    0.822  
## poly(rain/100, 3)1  -0.0861     0.6390   -0.13    0.894  
## poly(rain/100, 3)2  -0.1927     0.6511   -0.30    0.769  
## poly(rain/100, 3)3   1.3787     0.5732    2.41    0.023 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasibinomial family taken to be 1.94)
## 
##     Null deviance: 74.212  on 33  degrees of freedom
## Residual deviance: 62.635  on 30  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 3
plot(toxo$rain, toxo.glm5$fitted.values, pch="*", col="blue")
points(toxo$rain, toxo$r/toxo$m, col="grey")

There is some evidence of over-dispersion, let’s try:

toxo.quasi <- glm(cbind(succ,failure) ~ rainfall + I(rainfall^2) + I(rainfall^3), family = quasibinomial)
summary(toxo.quasi)
## 
## Call:
## glm(formula = cbind(succ, failure) ~ rainfall + I(rainfall^2) + 
##     I(rainfall^3), family = quasibinomial)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.762  -1.217  -0.508   0.354   2.620  
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   -290.1743   121.4942   -2.39    0.023 *
## rainfall        44.9975    18.7637    2.40    0.023 *
## I(rainfall^2)   -2.3113     0.9616   -2.40    0.023 *
## I(rainfall^3)    0.0393     0.0163    2.41    0.023 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasibinomial family taken to be 1.94)
## 
##     Null deviance: 74.212  on 33  degrees of freedom
## Residual deviance: 62.635  on 30  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 3

The point estimates are the same, but the significance has reduced.

Logistic regression: smooth fits

Davison goes on to fit local linear and local quadratic regression, using a version of locpoly for generalized linear models. It is explained on in Example 10.32. I used the gam package in mgcv. Using this is an art in itself, and the documentation isn’t terribly helpful. Possibly better at this point to turn to An Introduction to Statistical Learning and follow their examples. The library gam also has a gam function, and it will do local polynomial fits using gam.lo..

toxo.gam <- gam(cbind(r, m-r) ~ s(rain), family = binomial, data = toxo)
summary(toxo.gam)
## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## cbind(r, m - r) ~ s(rain)
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  -0.0901     0.0857   -1.05     0.29
## 
## Approximate significance of smooth terms:
##          edf Ref.df Chi.sq p-value   
## s(rain) 6.51   7.57   22.9  0.0027 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.221   Deviance explained = 36.7%
## UBRE = 0.82365  Scale est. = 1         n = 34
plot(toxo.gam)
points(toxo$rain, toxo$r/toxo$m, col="gray")

This looks wrong, and it is; the reason is in the fine print in the help file. The smooth function s(rain,6.51) is plotted on the scale of the linear predictor, i.e. on the logit scale, but I plotted the data on the probability scale. Here’s a better plot:

plot(toxo.gam)
with(toxo, points(rain, logit(r/m), col="gray"))

Alternatively, you can plot on the probability scale using the inverse transform:

plot(toxo.gam, seWithMean=TRUE, trans= ilogit, main = "GAM fit, automatic smoothing")
with(toxo, points(rain, r/m, col="gray"))

Here’s a try of a smoother fit, with 3 degrees of freedom

toxo.gam2 <- gam(cbind(r, m-r) ~ s(rain,k=4), family = binomial, data = toxo) # k-1 is the approximate degrees of freedom used
summary(toxo.gam2)
## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## cbind(r, m - r) ~ s(rain, k = 4)
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   0.0254     0.0767    0.33     0.74
## 
## Approximate significance of smooth terms:
##          edf Ref.df Chi.sq p-value  
## s(rain) 2.86   2.99   9.21   0.027 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0717   Deviance explained = 14.1%
## UBRE = 1.1018  Scale est. = 1         n = 34
plot(toxo.gam2, seWithMean=TRUE, trans=ilogit, main="GAM fit, 3 df")
with(toxo, points(rain, r/m, col="gray"))
points(toxo$rain, toxo.glm5$fitted.values, pch="*", col="blue" )

The stars are the fitted values from a 3rd degree polynomial, not very different from this cubic spline fit, just a bit more dramatic at the endpoints.