GLM Examples

I’m sure there are nicer ways to set things up, but I’m using a default version, except I added scipen, to force the output not to use exponential notation, which I hate.

Example 1: binomial data (Exercise 2.1 in FELM)

Faraway asks first to fit a model with all interactions, so here it is:

data(esoph)
head(esoph)

##   agegp     alcgp    tobgp ncases ncontrols
## 1 25-34 0-39g/day 0-9g/day      0        40
## 2 25-34 0-39g/day    10-19      0        10
## 3 25-34 0-39g/day    20-29      0         6
## 4 25-34 0-39g/day      30+      0         5
## 5 25-34     40-79 0-9g/day      0        27
## 6 25-34     40-79    10-19      0         7

help(esoph)
xtabs(ncases ~ agegp+tobgp, data = esoph)

##        tobgp
## agegp   0-9g/day 10-19 20-29 30+
##   25-34        0     1     0   0
##   35-44        2     4     3   0
##   45-54       14    13     8  11
##   55-64       25    23    12  16
##   65-74       31    12    10   2
##   75+          6     5     0   2

There could be more elegant ways to tabulate the data, but this is at least feasible to look at. In the help file for xtabs this dataset is used as an example. And I learned about “flat tables”.

ftable(xtabs(cbind(ncases,ncontrols)~ . , data=esoph))

##                           ncases ncontrols
## agegp alcgp     tobgp                     
## 25-34 0-39g/day 0-9g/day       0        40
##                 10-19          0        10
##                 20-29          0         6
##                 30+            0         5
##       40-79     0-9g/day       0        27
##                 10-19          0         7
##                 20-29          0         4
##                 30+            0         7
##       80-119    0-9g/day       0         2
##                 10-19          0         1
##                 20-29          0         0
##                 30+            0         2
##       120+      0-9g/day       0         1
##                 10-19          1         1
##                 20-29          0         1
##                 30+            0         2
## 35-44 0-39g/day 0-9g/day       0        60
##                 10-19          1        14
##                 20-29          0         7
##                 30+            0         8
##       40-79     0-9g/day       0        35
##                 10-19          3        23
##                 20-29          1        14
##                 30+            0         8
##       80-119    0-9g/day       0        11
##                 10-19          0         6
##                 20-29          0         2
##                 30+            0         1
##       120+      0-9g/day       2         3
##                 10-19          0         3
##                 20-29          2         4
##                 30+            0         0
## 45-54 0-39g/day 0-9g/day       1        46
##                 10-19          0        18
##                 20-29          0        10
##                 30+            0         4
##       40-79     0-9g/day       6        38
##                 10-19          4        21
##                 20-29          5        15
##                 30+            5         7
##       80-119    0-9g/day       3        16
##                 10-19          6        14
##                 20-29          1         5
##                 30+            2         4
##       120+      0-9g/day       4         4
##                 10-19          3         4
##                 20-29          2         3
##                 30+            4         4
## 55-64 0-39g/day 0-9g/day       2        49
##                 10-19          3        22
##                 20-29          3        12
##                 30+            4         6
##       40-79     0-9g/day       9        40
##                 10-19          6        21
##                 20-29          4        17
##                 30+            3         6
##       80-119    0-9g/day       9        18
##                 10-19          8        15
##                 20-29          3         6
##                 30+            4         4
##       120+      0-9g/day       5        10
##                 10-19          6         7
##                 20-29          2         3
##                 30+            5         6
## 65-74 0-39g/day 0-9g/day       5        48
##                 10-19          4        14
##                 20-29          2         7
##                 30+            0         2
##       40-79     0-9g/day      17        34
##                 10-19          3        10
##                 20-29          5         9
##                 30+            0         0
##       80-119    0-9g/day       6        13
##                 10-19          4        12
##                 20-29          2         3
##                 30+            1         1
##       120+      0-9g/day       3         4
##                 10-19          1         2
##                 20-29          1         1
##                 30+            1         1
## 75+   0-39g/day 0-9g/day       1        18
##                 10-19          2         6
##                 20-29          0         0
##                 30+            1         3
##       40-79     0-9g/day       2         5
##                 10-19          1         3
##                 20-29          0         3
##                 30+            1         1
##       80-119    0-9g/day       1         1
##                 10-19          1         1
##                 20-29          0         0
##                 30+            0         0
##       120+      0-9g/day       2         2
##                 10-19          1         1
##                 20-29          0         0
##                 30+            0         0

ftable(xtabs(cbind(ncases,ncontrols) ~ agegp, data = esoph))

##        ncases ncontrols
## agegp                  
## 25-34       1       116
## 35-44       9       199
## 45-54      46       213
## 55-64      76       242
## 65-74      55       161
## 75+        13        44

ftable(xtabs(cbind(ncases,ncontrols) ~ agegp + tobgp, data = esoph))

##                 ncases ncontrols
## agegp tobgp                     
## 25-34 0-9g/day       0        70
##       10-19          1        19
##       20-29          0        11
##       30+            0        16
## 35-44 0-9g/day       2       109
##       10-19          4        46
##       20-29          3        27
##       30+            0        17
## 45-54 0-9g/day      14       104
##       10-19         13        57
##       20-29          8        33
##       30+           11        19
## 55-64 0-9g/day      25       117
##       10-19         23        65
##       20-29         12        38
##       30+           16        22
## 65-74 0-9g/day      31        99
##       10-19         12        38
##       20-29         10        20
##       30+            2         4
## 75+   0-9g/day       6        26
##       10-19          5        11
##       20-29          0         3
##       30+            2         4

And now I’ll try the full model as requeseted in (a):

farmod1 <- glm(cbind(ncases, ncontrols) ~ agegp * tobgp * alcgp, data = esoph, family = binomial)

summary(farmod1)

## 
## Call:
## glm(formula = cbind(ncases, ncontrols) ~ agegp * tobgp * alcgp, 
##     family = binomial, data = esoph)
## 
## Deviance Residuals: 
##  [1]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## [26]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## [51]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## [76]  0  0  0  0  0  0  0  0  0  0  0  0  0
## 
## Coefficients: (8 not defined because of singularities)
##                            Estimate  Std. Error z value Pr(>|z|)
## (Intercept)                 -16.472  448746.504       0        1
## agegp.L                     -15.569 1618575.986       0        1
## agegp.Q                     -40.252 1484227.866       0        1
## agegp.C                     -14.684 1008976.267       0        1
## agegp^4                     -12.303  547448.216       0        1
## agegp^5                       0.127  195587.489       0        1
## tobgp.L                     -19.546  994718.397       0        1
## tobgp.Q                     -13.090  508437.111       0        1
## tobgp.C                      -3.141  272907.496       0        1
## alcgp.L                     -23.834  910490.633       0        1
## alcgp.Q                     -13.667  537896.363       0        1
## alcgp.C                      -7.679  446814.589       0        1
## agegp.L:tobgp.L             -93.887 3715371.184       0        1
## agegp.Q:tobgp.L             -77.269 3347188.027       0        1
## agegp.C:tobgp.L             -33.534 2229132.399       0        1
## agegp^4:tobgp.L             -41.093 1340669.776       0        1
## agegp^5:tobgp.L               5.377  401101.346       0        1
## agegp.L:tobgp.Q             -58.813 2176812.828       0        1
## agegp.Q:tobgp.Q             -53.162 1889690.079       0        1
## agegp.C:tobgp.Q             -20.832 1093745.212       0        1
## agegp^4:tobgp.Q             -22.200  913350.652       0        1
## agegp^5:tobgp.Q              -1.313  162562.540       0        1
## agegp.L:tobgp.C             -25.432 1209684.626       0        1
## agegp.Q:tobgp.C             -14.233 1051280.542       0        1
## agegp.C:tobgp.C              -7.456  533120.604       0        1
## agegp^4:tobgp.C              -9.394  538600.635       0        1
## agegp^5:tobgp.C               2.459   40537.146       0        1
## agegp.L:alcgp.L            -139.374 3510599.248       0        1
## agegp.Q:alcgp.L            -121.145 3133477.378       0        1
## agegp.C:alcgp.L             -59.010 2001861.710       0        1
## agegp^4:alcgp.L             -56.014 1313419.645       0        1
## agegp^5:alcgp.L             -12.241  337434.120       0        1
## agegp.L:alcgp.Q             -73.145 2344368.743       0        1
## agegp.Q:alcgp.Q             -52.208 2045544.146       0        1
## agegp.C:alcgp.Q             -26.446 1141314.446       0        1
## agegp^4:alcgp.Q             -37.422  969777.570       0        1
## agegp^5:alcgp.Q               9.039  149111.766       0        1
## agegp.L:alcgp.C             -52.932 1799046.068       0        1
## agegp.Q:alcgp.C             -43.309 1599967.759       0        1
## agegp.C:alcgp.C             -15.277  924318.738       0        1
## agegp^4:alcgp.C             -23.491  669602.012       0        1
## agegp^5:alcgp.C              -1.180  109910.517       0        1
## tobgp.L:alcgp.L             -55.524 2111355.720       0        1
## tobgp.Q:alcgp.L             -10.529 1190467.469       0        1
## tobgp.C:alcgp.L              13.875  451398.285       0        1
## tobgp.L:alcgp.Q             -41.199 1270833.457       0        1
## tobgp.Q:alcgp.Q             -31.930  777774.355       0        1
## tobgp.C:alcgp.Q             -17.179  399500.665       0        1
## tobgp.L:alcgp.C             -22.339  995093.905       0        1
## tobgp.Q:alcgp.C             -12.308  415519.343       0        1
## tobgp.C:alcgp.C               0.424   26466.259       0        1
## agegp.L:tobgp.L:alcgp.L    -284.038 8518464.949       0        1
## agegp.Q:tobgp.L:alcgp.L    -255.110 7548330.307       0        1
## agegp.C:tobgp.L:alcgp.L    -119.080 4618168.407       0        1
## agegp^4:tobgp.L:alcgp.L    -135.525 3302207.688       0        1
## agegp^5:tobgp.L:alcgp.L     -11.605  821017.435       0        1
## agegp.L:tobgp.Q:alcgp.L    -103.297 5251283.868       0        1
## agegp.Q:tobgp.Q:alcgp.L     -86.885 4573489.954       0        1
## agegp.C:tobgp.Q:alcgp.L     -13.858 2546319.838       0        1
## agegp^4:tobgp.Q:alcgp.L     -83.605 2211972.272       0        1
## agegp^5:tobgp.Q:alcgp.L      19.694  451884.902       0        1
## agegp.L:tobgp.C:alcgp.L      28.810 2178231.735       0        1
## agegp.Q:tobgp.C:alcgp.L      37.313 1871275.690       0        1
## agegp.C:tobgp.C:alcgp.L      18.220  904268.476       0        1
## agegp^4:tobgp.C:alcgp.L      11.687  994993.359       0        1
## agegp^5:tobgp.C:alcgp.L          NA          NA      NA       NA
## agegp.L:tobgp.L:alcgp.Q    -193.300 5799765.283       0        1
## agegp.Q:tobgp.L:alcgp.Q    -161.854 5002186.199       0        1
## agegp.C:tobgp.L:alcgp.Q     -77.169 2660725.551       0        1
## agegp^4:tobgp.L:alcgp.Q     -88.452 2509144.512       0        1
## agegp^5:tobgp.L:alcgp.Q       7.564  263437.728       0        1
## agegp.L:tobgp.Q:alcgp.Q    -163.091 3876596.317       0        1
## agegp.Q:tobgp.Q:alcgp.Q    -144.727 3275262.932       0        1
## agegp.C:tobgp.Q:alcgp.Q     -53.939 1605508.369       0        1
## agegp^4:tobgp.Q:alcgp.Q     -74.727 1773263.500       0        1
## agegp^5:tobgp.Q:alcgp.Q          NA          NA      NA       NA
## agegp.L:tobgp.C:alcgp.Q     -86.204 1832255.441       0        1
## agegp.Q:tobgp.C:alcgp.Q     -64.657 1511051.950       0        1
## agegp.C:tobgp.C:alcgp.Q     -39.782  891255.821       0        1
## agegp^4:tobgp.C:alcgp.Q     -33.337  803627.917       0        1
## agegp^5:tobgp.C:alcgp.Q          NA          NA      NA       NA
## agegp.L:tobgp.L:alcgp.C    -108.648 4072433.764       0        1
## agegp.Q:tobgp.L:alcgp.C     -93.993 3572958.455       0        1
## agegp.C:tobgp.L:alcgp.C     -44.130 2055374.262       0        1
## agegp^4:tobgp.L:alcgp.C     -44.701 1524151.088       0        1
## agegp^5:tobgp.L:alcgp.C      -3.678  228037.948       0        1
## agegp.L:tobgp.Q:alcgp.C     -44.180 1984146.123       0        1
## agegp.Q:tobgp.Q:alcgp.C     -42.372 1645734.736       0        1
## agegp.C:tobgp.Q:alcgp.C     -22.024  802652.993       0        1
## agegp^4:tobgp.Q:alcgp.C     -19.893  817172.909       0        1
## agegp^5:tobgp.Q:alcgp.C          NA          NA      NA       NA
## agegp.L:tobgp.C:alcgp.C      -2.805  221432.610       0        1
## agegp.Q:tobgp.C:alcgp.C          NA          NA      NA       NA
## agegp.C:tobgp.C:alcgp.C          NA          NA      NA       NA
## agegp^4:tobgp.C:alcgp.C          NA          NA      NA       NA
## agegp^5:tobgp.C:alcgp.C          NA          NA      NA       NA
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 227.24055853245858  on 87  degrees of freedom
## Residual deviance:   0.00000000030116  on  0  degrees of freedom
## AIC: 323.5
## 
## Number of Fisher Scoring iterations: 25

So we probably didn’t really want this. We’ve fit a parameter to every observation, not helpful. What is with all the .L, .C etc?

is.ordered(esoph$agegp)

## [1] TRUE

Ordered factors use polynomial contrasts to fit the various levels. This is a kind of compromise between treating the values as continuous and treating them as completely unordered. In the help file for this dataset you’ll see models with covariates unclass(tobgp) and unclass(alcgp).

Now Faraway suggests using backward interaction to simplify the model “as far as is reasonable”.

farmod2 <- step(farmod1, direction = "backward")

## Start:  AIC=323.5
## cbind(ncases, ncontrols) ~ agegp * tobgp * alcgp

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

##                     Df Deviance AIC
## - agegp:tobgp:alcgp 37     16.1 266
## <none>                      0.0 323

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## 
## Step:  AIC=265.6
## cbind(ncases, ncontrols) ~ agegp + tobgp + alcgp + agegp:tobgp + 
##     agegp:alcgp + tobgp:alcgp

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

##               Df Deviance AIC
## - agegp:tobgp 15     27.1 247
## - agegp:alcgp 15     34.4 254
## - tobgp:alcgp  9     23.8 255
## <none>               16.1 266
## 
## Step:  AIC=246.6
## cbind(ncases, ncontrols) ~ agegp + tobgp + alcgp + agegp:alcgp + 
##     tobgp:alcgp
## 
##               Df Deviance AIC
## - tobgp:alcgp  9     33.8 235
## - agegp:alcgp 15     47.5 237
## <none>               27.1 247
## 
## Step:  AIC=235.3
## cbind(ncases, ncontrols) ~ agegp + tobgp + alcgp + agegp:alcgp
## 
##               Df Deviance AIC
## - agegp:alcgp 15     54.0 226
## <none>               33.8 235
## - tobgp        3     44.2 240
## 
## Step:  AIC=225.4
## cbind(ncases, ncontrols) ~ agegp + tobgp + alcgp
## 
##         Df Deviance AIC
## <none>         54.0 226
## - tobgp  3     64.6 230
## - alcgp  3    120.0 286
## - agegp  5    131.5 293

summary(farmod2)

## 
## Call:
## glm(formula = cbind(ncases, ncontrols) ~ agegp + tobgp + alcgp, 
##     family = binomial, data = esoph)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.689  -0.562  -0.217   0.231   2.064  
## 
## Coefficients:
##             Estimate Std. Error z value          Pr(>|z|)    
## (Intercept)  -1.7800     0.1980   -8.99           < 2e-16 ***
## agegp.L       3.0053     0.6522    4.61 0.000004058789115 ***
## agegp.Q      -1.3379     0.5911   -2.26            0.0236 *  
## agegp.C       0.1531     0.4485    0.34            0.7329    
## agegp^4       0.0641     0.3088    0.21            0.8356    
## agegp^5      -0.1936     0.1954   -0.99            0.3216    
## tobgp.L       0.5945     0.1942    3.06            0.0022 ** 
## tobgp.Q       0.0654     0.1881    0.35            0.7282    
## tobgp.C       0.1568     0.1866    0.84            0.4007    
## alcgp.L       1.4919     0.1994    7.48 0.000000000000072 ***
## alcgp.Q      -0.2266     0.1795   -1.26            0.2068    
## alcgp.C       0.2546     0.1591    1.60            0.1094    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 227.241  on 87  degrees of freedom
## Residual deviance:  53.973  on 76  degrees of freedom
## AIC: 225.5
## 
## Number of Fisher Scoring iterations: 6

This eliminated all the interactions, but still used orthogonal polynomials for the variables. And up to 5th degree for age! Doesn’t qutie make sense. We could either treat the variables as continuous, or as unordered factors.

farmod3 <- glm(cbind(ncases, ncontrols) ~ unclass(agegp) + unclass(alcgp) + unclass(tobgp), family = binomial, data = esoph)
summary(farmod3)

## 
## Call:
## glm(formula = cbind(ncases, ncontrols) ~ unclass(agegp) + unclass(alcgp) + 
##     unclass(tobgp), family = binomial, data = esoph)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.959  -0.856  -0.436   0.307   1.935  
## 
## Coefficients:
##                Estimate Std. Error z value         Pr(>|z|)    
## (Intercept)     -5.5959     0.4154  -13.47          < 2e-16 ***
## unclass(agegp)   0.5287     0.0719    7.36 0.00000000000019 ***
## unclass(alcgp)   0.6938     0.0834    8.32          < 2e-16 ***
## unclass(tobgp)   0.2745     0.0807    3.40          0.00068 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 227.241  on 87  degrees of freedom
## Residual deviance:  73.959  on 84  degrees of freedom
## AIC: 229.4
## 
## Number of Fisher Scoring iterations: 4

This seems a bit too simple, treating each covariate as continuous and just fitting a linear model. Myabe we could try linear in age, but with tobacco and alcohol as factors. Below are several versions, they can be compared using anova, or their residual deviances can be compared.

esoph2 <- mutate(esoph,age = factor(agegp,ordered=FALSE), tob = factor(tobgp, ordered = F), alc = factor(alcgp, ordered = F) )
# new dataset has unordered factors 
farmod4 <- glm(cbind(ncases, ncontrols) ~ unclass(agegp) + tob*alc, data = esoph2, family = binomial)
farmod5 <- glm(cbind(ncases, ncontrols) ~ unclass(agegp) + tob + alc, data = esoph2, family = binomial)
farmod6 <- glm(cbind(ncases, ncontrols) ~ agegp + tob + alc, data = esoph2, family = binomial)

Example 2 Poisson

This example in Chapter 3 of FELM is used to introduce Poisson regression. Faraway first considers ordinary linear regression, and notes that the residuals don’t seem to have constant variance. Back in the day we’d take logs or square root of the response and then fit linear regression again, but a more modern approach is to fit a Poisson glm.

The residual deviance in a Poisson model is a test of fit of the model, as it is with a Binomial. The second fit below allows an inflation factor on the variance, while still using the simplicity of the Poisson for fitting.

data(gala)
head(gala)

##              Species Endemics  Area Elevation Nearest Scruz Adjacent
## Baltra            58       23 25.09       346     0.6   0.6     1.84
## Bartolome         31       21  1.24       109     0.6  26.3   572.33
## Caldwell           3        3  0.21       114     2.8  58.7     0.78
## Champion          25        9  0.10        46     1.9  47.4     0.18
## Coamano            2        1  0.05        77     1.9   1.9   903.82
## Daphne.Major      18       11  0.34       119     8.0   8.0     1.84

help(gala)

gal.glm <- glm(Species ~ . - Endemics,
             family = poisson,
             data = gala)

summary(gal.glm)

## 
## Call:
## glm(formula = Species ~ . - Endemics, family = poisson, data = gala)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -8.275  -4.497  -0.944   1.917  10.185  
## 
## Coefficients:
##               Estimate Std. Error z value  Pr(>|z|)    
## (Intercept)  3.1548079  0.0517495   60.96   < 2e-16 ***
## Area        -0.0005799  0.0000263  -22.07   < 2e-16 ***
## Elevation    0.0035406  0.0000874   40.51   < 2e-16 ***
## Nearest      0.0088256  0.0018213    4.85 0.0000013 ***
## Scruz       -0.0057094  0.0006256   -9.13   < 2e-16 ***
## Adjacent    -0.0006630  0.0000293  -22.61   < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 3510.73  on 29  degrees of freedom
## Residual deviance:  716.85  on 24  degrees of freedom
## AIC: 889.7
## 
## Number of Fisher Scoring iterations: 5

gal.quasi <- glm(Species ~ . - Endemics,
                 family = quasipoisson,
                 data = gala)
summary(gal.quasi)

## 
## Call:
## glm(formula = Species ~ . - Endemics, family = quasipoisson, 
##     data = gala)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -8.275  -4.497  -0.944   1.917  10.185  
## 
## Coefficients:
##              Estimate Std. Error t value     Pr(>|t|)    
## (Intercept)  3.154808   0.291590   10.82 0.0000000001 ***
## Area        -0.000580   0.000148   -3.92      0.00065 ***
## Elevation    0.003541   0.000493    7.19 0.0000001983 ***
## Nearest      0.008826   0.010262    0.86      0.39829    
## Scruz       -0.005709   0.003525   -1.62      0.11838    
## Adjacent    -0.000663   0.000165   -4.01      0.00051 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 31.75)
## 
##     Null deviance: 3510.73  on 29  degrees of freedom
## Residual deviance:  716.85  on 24  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5

plot(gal.quasi$linear.predictors, gal.quasi$residuals)

The canonical link for the Poisson is the log link. Identity links are sometimes used as well; see the cloth data analysis in SM Example 10.28.

data(cloth)
head(cloth)

##      x  y
## 1 1.22  1
## 2 1.70  4
## 3 2.71  5
## 4 3.71 14
## 5 3.72  7
## 6 3.75  9

with(cloth, plot(x,y)) # Figure 10.11

clothmod1 <- glm(y ~ x - 1, family = poisson(link = identity), data = cloth)
summary(clothmod1)

## 
## Call:
## glm(formula = y ~ x - 1, family = poisson(link = identity), data = cloth)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.817  -1.127  -0.237   0.627   3.438  
## 
## Coefficients:
##   Estimate Std. Error z value Pr(>|z|)    
## x   1.5102     0.0896    16.9   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance:    Inf  on 32  degrees of freedom
## Residual deviance: 64.537  on 31  degrees of freedom
## AIC: 189.8
## 
## Number of Fisher Scoring iterations: 3

clothmod2 <- glm(y ~ x - 1, family = quasipoisson(link=identity), data = cloth )
summary(clothmod2)

## 
## Call:
## glm(formula = y ~ x - 1, family = quasipoisson(link = identity), 
##     data = cloth)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.817  -1.127  -0.237   0.627   3.438  
## 
## Coefficients:
##   Estimate Std. Error t value        Pr(>|t|)    
## x    1.510      0.133    11.4 0.0000000000014 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 2.194)
## 
##     Null deviance:    Inf  on 32  degrees of freedom
## Residual deviance: 64.537  on 31  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 3

plot(clothmod2)

In SM (p. 511) the suggestion is made that a route to an overdispersed Poisson would be to model \(Y\) as Poisson with mean \(\mu\epsilon\), where \(\epsilon\) is a latent random variable with expected value 1. If a gamma distribution with shape \(\nu\) and mean 1 is postulated, then the marginal distribution of \(Y\) is negative binomial. The negative binomial model can be fit by maximum likelihood using MASS::glm.nb; see FELM 3.3.

Example 3 Gamma

I will use the chimp data set analysed in SM, Example 10.16.

data(chimps)
head(chimps)

##   chimp word   y
## 1     1    1 178
## 2     1    2  60
## 3     1    3 177
## 4     1    4  36
## 5     1    5 225
## 6     1    6 345

The response is time to learn each of 10 words, in minutes, and there are four chimpanzees. See p.485 of SM where the data are set out in a randomized block format. Davison’s first analysis is a simple lm(y ~ chimp + word, data = chimps), and then he applies Tukey’s 1 df test for non-additivity (see Example 8.24) to confirm that the additive model is not a good fit.

model1 <- glm(y ~ chimp + word, family=Gamma(link = "log"), data = chimps)
options(digits=6)
anova(model1)

## Analysis of Deviance Table
## 
## Model: Gamma, link: log
## 
## Response: y
## 
## Terms added sequentially (first to last)
## 
## 
##       Df Deviance Resid. Df Resid. Dev
## NULL                     39      60.38
## chimp  3     6.95        36      53.43
## word   9    38.46        27      14.97

This reproduces the first half of Table 10.6. The other side is the same thing with the factors introduced in the opposite order.

model2 <- glm(y ~ word + chimp, family = Gamma(link = "log"), data = chimps)
summary(model2)

## 
## Call:
## glm(formula = y ~ word + chimp, family = Gamma(link = "log"), 
##     data = chimps)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6766  -0.4510  -0.0488   0.3074   1.2847  
## 
## Coefficients:
##             Estimate Std. Error t value          Pr(>|t|)    
## (Intercept)    5.458      0.375   14.54 0.000000000000027 ***
## word2         -1.771      0.466   -3.80           0.00074 ***
## word3         -0.365      0.466   -0.78           0.44007    
## word4         -1.821      0.466   -3.91           0.00056 ***
## word5         -1.102      0.466   -2.37           0.02541 *  
## word6          0.279      0.466    0.60           0.55461    
## word7         -1.725      0.466   -3.70           0.00096 ***
## word8         -2.555      0.466   -5.49 0.000008278115332 ***
## word9          0.811      0.466    1.74           0.09301 .  
## word10        -0.514      0.466   -1.10           0.27972    
## chimp2        -0.973      0.295   -3.31           0.00269 ** 
## chimp3        -0.808      0.295   -2.74           0.01068 *  
## chimp4        -0.113      0.295   -0.38           0.70479    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Gamma family taken to be 0.433666)
## 
##     Null deviance: 60.378  on 39  degrees of freedom
## Residual deviance: 14.972  on 27  degrees of freedom
## AIC: 418.4
## 
## Number of Fisher Scoring iterations: 12

anova(model2)

## Analysis of Deviance Table
## 
## Model: Gamma, link: log
## 
## Response: y
## 
## Terms added sequentially (first to last)
## 
## 
##       Df Deviance Resid. Df Resid. Dev
## NULL                     39      60.38
## word   9    39.19        30      21.19
## chimp  3     6.22        27      14.97

It makes a bit of difference, but not substantial. To test for a chimp effect we need an estimate of the dispersion parameter, which is obtained from summary as 0.434 (Davison reports .432), needs checking. Perhaps summary used deviance instead of Pearson \(X^2\)?)

The chimp test statistic is \((6.22/3)/0.434 = 4.77\) which gives a \(p\)-value of \(0.008\) when referred to an \(F_{3, 27}\) distribution. The word test statistic is \((39.19/9)/0.434\) about \(10^{-6}\).

test1 <- (6.22/3)/0.434
pf(test1, 3, 27, lower.tail = F)

## [1] 0.00849023

test2 <- (38.46/9)/0.434
pf(test2, 9, 27, lower.tail = F)

## [1] 0.00000166957

Davison suggests two other approaches to this data: an ordinary normal theory model fit to \(\log(y)\), and a gamma model with the canonical link \(1/\mu\). I’ll let you explore those on your own.

GLM Examples

Nov 5

04/11/2020

Example 1: binomial data (Exercise 2.1 in FELM)

Example 2 Poisson

Example 3 Gamma