## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows

Blocking is very widely used in agricultural experiments (even today). In these experients blocks are physical areas of land in the field. In the FLM example in Ch 17 (2nd ed) or Ch 16 (1st ed) there are eight varieties of oats that have been planted in five blocks. The response variable is the yield in grams per row. (See ?oatvar.) Every variety is planted in each block. So when we compare the average yield from variety 1, say to the average yield from variety 2, it won’t matter if, for example, Block V gives low yields, because that will impact all 8 varieties equally.

The data:

data(oatvar);head(oatvar);dim(oatvar)
##   yield block variety
## 1   296     I       1
## 2   357    II       1
## 3   340   III       1
## 4   331    IV       1
## 5   348     V       1
## 6   402     I       2
## [1] 40  3
is.factor(oatvar$block); is.factor(oatvar$variety)
## [1] TRUE
## [1] TRUE

There are 8 varieties and 5 blocks for a total of 40 observations. In the data frame there is one line per variety/block combination, but the data is more easily viewed in a table:

xtabs(yield ~ variety + block, data = oatvar)
##        block
## variety   I  II III  IV   V
##       1 296 357 340 331 348
##       2 402 390 431 340 320
##       3 437 334 426 320 296
##       4 303 319 310 260 242
##       5 469 405 442 487 394
##       6 345 342 358 300 308
##       7 324 339 357 352 220
##       8 488 374 401 338 320
kable(xtabs(yield ~ variety + block, data = oatvar))
I II III IV V
296 357 340 331 348
402 390 431 340 320
437 334 426 320 296
303 319 310 260 242
469 405 442 487 394
345 342 358 300 308
324 339 357 352 220
488 374 401 338 320

(I don’t know much about kable, but it creates latex code, so only helps if you knit to pdf. It has many available adjustments.)

See FLM-2 Figure 17.1 for a ggplot version.

## Analysis of Variance Table
## 
## Response: yield
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## variety    7  77524 11074.8  8.2839 1.804e-05 ***
## block      4  33396  8348.9  6.2449  0.001008 ** 
## Residuals 28  37433  1336.9                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I will let you check that if you fit yield ~ block + variety you get the same anova table. (Why?)

What about the estimated coefficients?

## 
## Call:
## lm(formula = yield ~ variety + block, data = oatvar)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -67.275 -19.675  -4.638  17.725  74.925 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  363.275     20.027  18.139  < 2e-16 ***
## variety2      42.200     23.125   1.825 0.078707 .  
## variety3      28.200     23.125   1.219 0.232841    
## variety4     -47.600     23.125  -2.058 0.048968 *  
## variety5     105.000     23.125   4.541 9.73e-05 ***
## variety6      -3.800     23.125  -0.164 0.870656    
## variety7     -16.000     23.125  -0.692 0.494701    
## variety8      49.800     23.125   2.154 0.040035 *  
## blockII      -25.500     18.282  -1.395 0.174036    
## blockIII       0.125     18.282   0.007 0.994593    
## blockIV      -42.000     18.282  -2.297 0.029282 *  
## blockV       -77.000     18.282  -4.212 0.000238 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 36.56 on 28 degrees of freedom
## Multiple R-squared:  0.7477, Adjusted R-squared:  0.6485 
## F-statistic: 7.542 on 11 and 28 DF,  p-value: 7.723e-06

These estimates are not very informative. For one thing, they are all relative to the baseline value of 0 for variety 1, block I. Also, the block effects aren’t really of interest; they are there to ensure that the variety comparisons are fair. It is of more interest to construct a table of means:

##     1     2     3     4     5     6     7     8 
## 334.4 376.6 362.6 286.8 439.4 330.6 318.4 384.2

(There must be more elegant ways to see the table of means, along with estimated standard errors.)

In this model checking constant variance is can be done any of 3 ways:

You can also plot the residuals against block and variety; the default plot is a side-by-side boxplot, but I think points look nicer.