Monte Carlo Power Handout One
###############################################################################
# Artificial power example: Simple regression, n=100, errors standard normal, #
# Y = beta0 + beta1 x + epsilon, true beta1/sigma = 1/10. #
##############################################################################
ncp <- function(eff,X,CC) # Returns noncentrality parameter for normal regression
# eff - a dx1 vector (C Beta - h)/sigma
# X - the usual nxp matrix
# CC - a d x p matrix as in H0: C Beta = h
{ncp <- t(eff)%*%solve(CC%*%solve(t(X)%*%X)%*%t(CC))%*%eff
ncp}
m <- 16588 ; Y1 <- numeric(m)
# Choose m to ensure 99% margin of error less than .01 (overkill)
n <- 100 ; int <- numeric(n) + 1
effect <- .1 ; cmat <- rbind(c(0,1))
fcrit <- qf(.95,1,n-2) ; z99 <- qnorm(.995)
# NORMAL IV
for(i in 1:m)
{
ncent <- ncp(effect,cbind(int,rnorm(n)),cmat)
Y1[i] <- 1-pf(fcrit,1,n-2,ncent)
}
ybar1 <- mean(Y1) ; v1 <- var(Y1)
ci1 <- numeric(2)
ci1[1] <- ybar1 - z99 * sqrt(var(Y1)/m)
ci1[2] <- ybar1 + z99 * sqrt(var(Y1)/m)
cat("Monte Carlo Estimate with normal IV is ",ybar1,", with 99% CI of \n",
ci1[1]," to ",ci1[2],"\n")
This produced
Monte Carlo Estimate with normal IV is 0.1664529 , with 99% CI of 0.1661140 to 0.1667919 > ci1[2]-ci1[1] [1] 0.0006779349
And it took about 6 cpu minutes - overkill.
# CAUCHY IV
m <- 500 ; Y2 <- numeric(m)
for(i in 1:m)
{
ncent <- ncp(effect,cbind(int,rcauchy(n)),cmat)
Y2[i] <- 1-pf(fcrit,1,n-2,ncent)
}
ybar2 <- mean(Y2) ; v2 <- var(Y2)
ci2 <- numeric(2)
ci2[1] <- ybar2 - z99 * sqrt(var(Y2)/m)
ci2[2] <- ybar2 + z99 * sqrt(var(Y2)/m)
cat("Monte Carlo Estimate with normal IV is ",ybar2,", with 99% CI of \n",
ci2[1]," to ",ci2[2],"\n")
Output was:
Monte Carlo Estimate with Cauchy IV is 0.9958736 , with 99% CI of 0.9932654 to 0.9984819
Suppose we want to assess the debt load of university students upon graduation. In particular, we are interested in whether there is a sex difference, and in whether any sex difference depends on full versus part-time status. To reduce error variation, we will treat province as a blocking variable.
Suppose we take a random sample from the population of graduating Canadian university students, and determine debt load. Suppose also that there is a real sex difference of 0.5 s , but no interaction, and that we are testing simultaneously for the sex difference and the interaction. What is the power for n=100?
University enrolment, full-time and part-time, by sex: 1998-1999
FULL TIME
Province Male Female
Newfoundland 5,553 7,562
Prince Edward Island 950 1,520
Nova Scotia 12,845 17,182
New Brunswick 8,251 10,278
Quebec 59,363 74,799
Ontario 105,119 124,866
Manitoba 9,380 11,503
Saskatchewan 10,499 13,157
Alberta 24,302 29,208
British Columbia 24,639 29,400
TOTAL 260,901 319,475
PART TIME
Province Male Female
Newfoundland 1,045 1,550
Prince Edward Island 107 310
Nova Scotia 2,689 4,525
New Brunswick 1,405 2,832
Quebec 38,568 59,548
Ontario 29,129 43,829
Manitoba 4,006 5,846
Saskatchewan 2,824 4,798
Alberta 7,740 12,523
British Columbia 9,066 13,645
TOTAL 96,579 149,406
Data File u.dat looks like this
Prob Prov Sex FT-PT Int Indicators for province
1 0.006720 1 -1 -1 1 1 0 0 0 0 0 0 0 0
2 0.009151 1 1 -1 -1 1 0 0 0 0 0 0 0 0
3 0.001150 2 -1 -1 1 0 1 0 0 0 0 0 0 0
4 0.001839 2 1 -1 -1 0 1 0 0 0 0 0 0 0
5 0.015544 3 -1 -1 1 0 0 1 0 0 0 0 0 0
6 0.020792 3 1 -1 -1 0 0 1 0 0 0 0 0 0
7 0.009985 4 -1 -1 1 0 0 0 1 0 0 0 0 0
8 0.012438 4 1 -1 -1 0 0 0 1 0 0 0 0 0
9 0.071837 5 -1 -1 1 0 0 0 0 1 0 0 0 0
10 0.090516 5 1 -1 -1 0 0 0 0 1 0 0 0 0
11 0.127207 6 -1 -1 1 0 0 0 0 0 1 0 0 0
12 0.151103 6 1 -1 -1 0 0 0 0 0 1 0 0 0
13 0.011351 7 -1 -1 1 0 0 0 0 0 0 1 0 0
14 0.013920 7 1 -1 -1 0 0 0 0 0 0 1 0 0
15 0.012705 8 -1 -1 1 0 0 0 0 0 0 0 1 0
16 0.015922 8 1 -1 -1 0 0 0 0 0 0 0 1 0
17 0.029408 9 -1 -1 1 0 0 0 0 0 0 0 0 1
18 0.035345 9 1 -1 -1 0 0 0 0 0 0 0 0 1
19 0.029816 10 -1 -1 1 0 0 0 0 0 0 0 0 0
20 0.035578 10 1 -1 -1 0 0 0 0 0 0 0 0 0
21 0.001265 1 -1 1 -1 1 0 0 0 0 0 0 0 0
22 0.001876 1 1 1 1 1 0 0 0 0 0 0 0 0
23 0.000129 2 -1 1 -1 0 1 0 0 0 0 0 0 0
24 0.000375 2 1 1 1 0 1 0 0 0 0 0 0 0
25 0.003254 3 -1 1 -1 0 0 1 0 0 0 0 0 0
26 0.005476 3 1 1 1 0 0 1 0 0 0 0 0 0
27 0.001700 4 -1 1 -1 0 0 0 1 0 0 0 0 0
28 0.003427 4 1 1 1 0 0 0 1 0 0 0 0 0
29 0.046672 5 -1 1 -1 0 0 0 0 1 0 0 0 0
30 0.072061 5 1 1 1 0 0 0 0 1 0 0 0 0
31 0.035250 6 -1 1 -1 0 0 0 0 0 1 0 0 0
32 0.053039 6 1 1 1 0 0 0 0 0 1 0 0 0
33 0.004848 7 -1 1 -1 0 0 0 0 0 0 1 0 0
34 0.007074 7 1 1 1 0 0 0 0 0 0 1 0 0
35 0.003417 8 -1 1 -1 0 0 0 0 0 0 0 1 0
36 0.005806 8 1 1 1 0 0 0 0 0 0 0 1 0
37 0.009366 9 -1 1 -1 0 0 0 0 0 0 0 0 1
38 0.015154 9 1 1 1 0 0 0 0 0 0 0 0 1
39 0.010971 10 -1 1 -1 0 0 0 0 0 0 0 0 0
40 0.016512 10 1 1 1 0 0 0 0 0 0 0 0 0
xxx <- read.table("u.dat") # Read u.dat into a matrix
prob <- xxx[,2] # Puts col 2 into prob
xxx <- as.matrix(xxx[,4:15]) # Just keep columns 4 through 15 (dummy vars)
simdis <- function(p,values) # Simulate from discrete dist, prop to p
{ if(length(p)!=length(values)) stop("p & values must be same length!")
nn <- length(p) ; cumprob <- cumsum(p)/sum(p)
simdis <- min(values[cumprob>runif(1)])
simdis
} # End of function simdis
ncp <- function(eff,X,CC) # Returns noncentrality parameter for normal regression
# eff - a dx1 vector (C Beta - h)/sigma
# X - the usual nxp matrix
# CC - a d x p matrix as in H0: C Beta = h
{ncp <- t(eff)%*%solve(CC%*%solve(t(X)%*%X)%*%t(CC))%*%eff
ncp}
# Now do n=100, m=500
n <- 100 ; X <- numeric(n*12) ; dim(X) <- c(n,12)
int <- numeric(n) + 1
m <- 500 # MC Sample Size
effect <- cbind(c(.25,0))
cmat <- rbind(c(0,1,0,0,0,0,0,0,0,0,0,0,0),
c(0,0,0,1,0,0,0,0,0,0,0,0,0))
Y <- numeric(m)
fcrit <- qf(.95,2,n-13) ; z99 <- qnorm(.995)
for(k in 1:m)
{
for(i in 1:n) X[i,] <- xxx[simdis(prob,1:40),] # Random row of xxx
X <- cbind(int,X)
ncent <- ncp(effect,X,cmat)
Y[k] <- 1-pf(fcrit,2,n-13,ncent)
}
This failed, probably because there were columns of all zeros. To be continued.