library(faraway)
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
This was a designed experiment: 125 fruitflies were divided into 5 groups of 25 each, and each group was assigned a different treatment. (It is conventional to refer to ``treatments’’ in DoE, even if it’s not a drug.) In this case the treatments were expsoure to female fruitflies. One group was solitary, another with a single virgin female, another with eight virgin females, another with 1 pregnant female and another wtih 8 pregnant females. Pregnant females will not mate.
The response variable was longevity
, measured in days. A continuous covariate thorax
was also recorded: it is the length of the thorax in millimeters, and known to be associated with longevity. The goal of the study was to see if sexual activity has a cost to male fruitflies, in terms of longevity.1
First, a look at the data:
head(fruitfly);dim(fruitfly)
## thorax longevity activity
## 1 0.68 37 many
## 2 0.68 49 many
## 3 0.72 46 many
## 4 0.72 63 many
## 5 0.76 39 many
## 6 0.76 46 many
## [1] 124 3
is.factor(fruitfly$activity)
## [1] TRUE
ggplot(aes(x=thorax, y=longevity), data = fruitfly) + geom_point() + facet_wrap(~ activity)
It is clear that there is a linear relationship between thorax length and longevity, as expected.
Two questions of interest are whether there is a difference in longevity between activity groups, and whether the linear relationship between thorax and longevity is the same in each group. The latter is perhaps not that important in this example, but if it is the same, the resulting model will be simpler.
lmod <- lm(longevity ~ thorax * activity, data = fruitfly)
summary(lmod)
##
## Call:
## lm(formula = longevity ~ thorax * activity, data = fruitfly)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.9509 -6.7296 -0.9103 6.1854 30.3071
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -50.2420 21.8012 -2.305 0.023 *
## thorax 136.1268 25.9517 5.245 7.27e-07 ***
## activityone 6.5172 33.8708 0.192 0.848
## activitylow -7.7501 33.9690 -0.228 0.820
## activitymany -1.1394 32.5298 -0.035 0.972
## activityhigh -11.0380 31.2866 -0.353 0.725
## thorax:activityone -4.6771 40.6518 -0.115 0.909
## thorax:activitylow 0.8743 40.4253 0.022 0.983
## thorax:activitymany 6.5478 39.3600 0.166 0.868
## thorax:activityhigh -11.1268 38.1200 -0.292 0.771
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.71 on 114 degrees of freedom
## Multiple R-squared: 0.6534, Adjusted R-squared: 0.626
## F-statistic: 23.88 on 9 and 114 DF, p-value: < 2.2e-16
(I’ve just done what you’re not supposed to do, shown all the output in its ugliness.) The notation a*b
on the right side is shorthand for a + b + a:b
, where a:b
is the interaction term.
How does lm
get the output? Try below, and then look at model.matrix(lmod2)
or model.matrix(lmod)
. (They are long, you might want just the first 10 rows or so.)
With factor variables, an analysis of variance table is helpful, because you can see the amount of variation explained by the factor by an \(F\)-test.
anova(lmod)
## Analysis of Variance Table
##
## Response: longevity
## Df Sum Sq Mean Sq F value Pr(>F)
## thorax 1 15003.3 15003.3 130.733 < 2.2e-16 ***
## activity 4 9634.6 2408.6 20.988 5.503e-13 ***
## thorax:activity 4 24.3 6.1 0.053 0.9947
## Residuals 114 13083.0 114.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lmod2 <- lm(longevity ~ thorax + activity, data = fruitfly)
anova(lmod2)
## Analysis of Variance Table
##
## Response: longevity
## Df Sum Sq Mean Sq F value Pr(>F)
## thorax 1 15003.3 15003.3 135.069 < 2.2e-16 ***
## activity 4 9634.6 2408.6 21.684 1.974e-13 ***
## Residuals 118 13107.3 111.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lm(longevity ~ activity + thorax, data = fruitfly))
## Analysis of Variance Table
##
## Response: longevity
## Df Sum Sq Mean Sq F value Pr(>F)
## activity 4 12270 3067.4 27.614 3.481e-16 ***
## thorax 1 12368 12368.4 111.348 < 2.2e-16 ***
## Residuals 118 13107 111.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(lmod2)
##
## Call:
## lm(formula = longevity ~ thorax + activity, data = fruitfly)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26.108 -7.014 -1.101 6.234 30.265
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -48.749 10.850 -4.493 1.65e-05 ***
## thorax 134.341 12.731 10.552 < 2e-16 ***
## activityone 2.637 2.984 0.884 0.3786
## activitylow -7.015 2.981 -2.353 0.0203 *
## activitymany 4.139 3.027 1.367 0.1741
## activityhigh -20.004 3.016 -6.632 1.05e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.54 on 118 degrees of freedom
## Multiple R-squared: 0.6527, Adjusted R-squared: 0.638
## F-statistic: 44.36 on 5 and 118 DF, p-value: < 2.2e-16
What is the difference between the two anova tables? Note that when we omit the interaction term, activity
shows a large difference between levels; in particular those in the high
category have a much reduced lifetime, relative to the baseline category. (What is the baseline category?)
The \(p\)-value for this in the table is \(10^{-9}\) which shouldn’t be taken very seriously (why not?). A table of sample means in the five groups is more informative (I found this code by googling):
group_by(fruitfly, activity) %>% summarize(mean = mean(longevity))
## # A tibble: 5 × 2
## activity mean
## <fct> <dbl>
## 1 isolated 63.6
## 2 one 64.8
## 3 low 56.8
## 4 many 64.5
## 5 high 38.7
To test differences between any two group means we will need to estimate the standard error of the difference; this is not difficult, but a bit tedious. The original paper didn’t do this, just reported a “\(p < 0.001\)” for the high activity group, and “\(p < 0.05\)” for the low activity group.
Often with several groups the textbook recommendation is to do all pairwise comparisons of the means, and I’m sure that there is R
code out there somewhere that will quickly do that for you, and offer a choice of adjustments for multiple testing.
For this particular experiment, though, the groups “one” and “many” were described as controls, as was the group “isolated”, so the comparisons of interest are fairly limited. The comparisons of most interest are probably “isolated” against “one” and “many”, as these are slightly different types of controls, and then “low” and “high” against the average of the 3 control groups. The effect is so striking here, though, that this detail may not be needed.