##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
Blocking is very widely used in agricultural experiments (even today). In these experients blocks are physical areas of land in the field. In the FLM example in Ch 17 (2nd ed) or Ch 16 (1st ed) there are eight varieties of oats that have been planted in five blocks. The response variable is the yield in grams per row. (See ?oatvar
.) Every variety is planted in each block. So when we compare the average yield from variety 1, say to the average yield from variety 2, it won’t matter if, for example, Block V gives low yields, because that will impact all 8 varieties equally.
The data:
data(oatvar);head(oatvar);dim(oatvar)
## yield block variety
## 1 296 I 1
## 2 357 II 1
## 3 340 III 1
## 4 331 IV 1
## 5 348 V 1
## 6 402 I 2
## [1] 40 3
is.factor(oatvar$block); is.factor(oatvar$variety)
## [1] TRUE
## [1] TRUE
There are 8 varieties and 5 blocks for a total of 40 observations. In the data frame there is one line per variety/block combination, but the data is more easily viewed in a table:
xtabs(yield ~ variety + block, data = oatvar)
## block
## variety I II III IV V
## 1 296 357 340 331 348
## 2 402 390 431 340 320
## 3 437 334 426 320 296
## 4 303 319 310 260 242
## 5 469 405 442 487 394
## 6 345 342 358 300 308
## 7 324 339 357 352 220
## 8 488 374 401 338 320
kable(xtabs(yield ~ variety + block, data = oatvar))
I | II | III | IV | V |
---|---|---|---|---|
296 | 357 | 340 | 331 | 348 |
402 | 390 | 431 | 340 | 320 |
437 | 334 | 426 | 320 | 296 |
303 | 319 | 310 | 260 | 242 |
469 | 405 | 442 | 487 | 394 |
345 | 342 | 358 | 300 | 308 |
324 | 339 | 357 | 352 | 220 |
488 | 374 | 401 | 338 | 320 |
(I don’t know much about kable, but it creates latex code, so only helps if you knit to pdf. It has many available adjustments.)
See FLM-2 Figure 17.1 for a ggplot
version.
## Analysis of Variance Table
##
## Response: yield
## Df Sum Sq Mean Sq F value Pr(>F)
## variety 7 77524 11074.8 8.2839 1.804e-05 ***
## block 4 33396 8348.9 6.2449 0.001008 **
## Residuals 28 37433 1336.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
I will let you check that if you fit yield ~ block + variety
you get the same anova table. (Why?)
What about the estimated coefficients?
##
## Call:
## lm(formula = yield ~ variety + block, data = oatvar)
##
## Residuals:
## Min 1Q Median 3Q Max
## -67.275 -19.675 -4.638 17.725 74.925
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 363.275 20.027 18.139 < 2e-16 ***
## variety2 42.200 23.125 1.825 0.078707 .
## variety3 28.200 23.125 1.219 0.232841
## variety4 -47.600 23.125 -2.058 0.048968 *
## variety5 105.000 23.125 4.541 9.73e-05 ***
## variety6 -3.800 23.125 -0.164 0.870656
## variety7 -16.000 23.125 -0.692 0.494701
## variety8 49.800 23.125 2.154 0.040035 *
## blockII -25.500 18.282 -1.395 0.174036
## blockIII 0.125 18.282 0.007 0.994593
## blockIV -42.000 18.282 -2.297 0.029282 *
## blockV -77.000 18.282 -4.212 0.000238 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 36.56 on 28 degrees of freedom
## Multiple R-squared: 0.7477, Adjusted R-squared: 0.6485
## F-statistic: 7.542 on 11 and 28 DF, p-value: 7.723e-06
These estimates are not very informative. For one thing, they are all relative to the baseline value of 0 for variety 1, block I. Also, the block effects aren’t really of interest; they are there to ensure that the variety comparisons are fair. It is of more interest to construct a table of means:
## 1 2 3 4 5 6 7 8
## 334.4 376.6 362.6 286.8 439.4 330.6 318.4 384.2
(There must be more elegant ways to see the table of means, along with estimated standard errors.)
In this model checking constant variance is can be done any of 3 ways:
You can also plot the residuals against block
and variety
; the default plot is a side-by-side boxplot, but I think points look nicer.