Monte Carlo Power Handout One
############################################################################### # Artificial power example: Simple regression, n=100, errors standard normal, # # Y = beta0 + beta1 x + epsilon, true beta1/sigma = 1/10. # ############################################################################## ncp <- function(eff,X,CC) # Returns noncentrality parameter for normal regression # eff - a dx1 vector (C Beta - h)/sigma # X - the usual nxp matrix # CC - a d x p matrix as in H0: C Beta = h {ncp <- t(eff)%*%solve(CC%*%solve(t(X)%*%X)%*%t(CC))%*%eff ncp} m <- 16588 ; Y1 <- numeric(m) # Choose m to ensure 99% margin of error less than .01 (overkill) n <- 100 ; int <- numeric(n) + 1 effect <- .1 ; cmat <- rbind(c(0,1)) fcrit <- qf(.95,1,n-2) ; z99 <- qnorm(.995) # NORMAL IV for(i in 1:m) { ncent <- ncp(effect,cbind(int,rnorm(n)),cmat) Y1[i] <- 1-pf(fcrit,1,n-2,ncent) } ybar1 <- mean(Y1) ; v1 <- var(Y1) ci1 <- numeric(2) ci1[1] <- ybar1 - z99 * sqrt(var(Y1)/m) ci1[2] <- ybar1 + z99 * sqrt(var(Y1)/m) cat("Monte Carlo Estimate with normal IV is ",ybar1,", with 99% CI of \n", ci1[1]," to ",ci1[2],"\n")
This produced
Monte Carlo Estimate with normal IV is 0.1664529 , with 99% CI of 0.1661140 to 0.1667919 > ci1[2]-ci1[1] [1] 0.0006779349
And it took about 6 cpu minutes - overkill.
# CAUCHY IV m <- 500 ; Y2 <- numeric(m) for(i in 1:m) { ncent <- ncp(effect,cbind(int,rcauchy(n)),cmat) Y2[i] <- 1-pf(fcrit,1,n-2,ncent) } ybar2 <- mean(Y2) ; v2 <- var(Y2) ci2 <- numeric(2) ci2[1] <- ybar2 - z99 * sqrt(var(Y2)/m) ci2[2] <- ybar2 + z99 * sqrt(var(Y2)/m) cat("Monte Carlo Estimate with normal IV is ",ybar2,", with 99% CI of \n", ci2[1]," to ",ci2[2],"\n")
Output was:
Monte Carlo Estimate with Cauchy IV is 0.9958736 , with 99% CI of 0.9932654 to 0.9984819
Suppose we want to assess the debt load of university students upon graduation. In particular, we are interested in whether there is a sex difference, and in whether any sex difference depends on full versus part-time status. To reduce error variation, we will treat province as a blocking variable.
Suppose we take a random sample from the population of graduating Canadian university students, and determine debt load. Suppose also that there is a real sex difference of 0.5 s , but no interaction, and that we are testing simultaneously for the sex difference and the interaction. What is the power for n=100?
University enrolment, full-time and part-time, by sex: 1998-1999 FULL TIME Province Male Female Newfoundland 5,553 7,562 Prince Edward Island 950 1,520 Nova Scotia 12,845 17,182 New Brunswick 8,251 10,278 Quebec 59,363 74,799 Ontario 105,119 124,866 Manitoba 9,380 11,503 Saskatchewan 10,499 13,157 Alberta 24,302 29,208 British Columbia 24,639 29,400 TOTAL 260,901 319,475 PART TIME Province Male Female Newfoundland 1,045 1,550 Prince Edward Island 107 310 Nova Scotia 2,689 4,525 New Brunswick 1,405 2,832 Quebec 38,568 59,548 Ontario 29,129 43,829 Manitoba 4,006 5,846 Saskatchewan 2,824 4,798 Alberta 7,740 12,523 British Columbia 9,066 13,645 TOTAL 96,579 149,406
Data File u.dat looks like this
Prob Prov Sex FT-PT Int Indicators for province 1 0.006720 1 -1 -1 1 1 0 0 0 0 0 0 0 0 2 0.009151 1 1 -1 -1 1 0 0 0 0 0 0 0 0 3 0.001150 2 -1 -1 1 0 1 0 0 0 0 0 0 0 4 0.001839 2 1 -1 -1 0 1 0 0 0 0 0 0 0 5 0.015544 3 -1 -1 1 0 0 1 0 0 0 0 0 0 6 0.020792 3 1 -1 -1 0 0 1 0 0 0 0 0 0 7 0.009985 4 -1 -1 1 0 0 0 1 0 0 0 0 0 8 0.012438 4 1 -1 -1 0 0 0 1 0 0 0 0 0 9 0.071837 5 -1 -1 1 0 0 0 0 1 0 0 0 0 10 0.090516 5 1 -1 -1 0 0 0 0 1 0 0 0 0 11 0.127207 6 -1 -1 1 0 0 0 0 0 1 0 0 0 12 0.151103 6 1 -1 -1 0 0 0 0 0 1 0 0 0 13 0.011351 7 -1 -1 1 0 0 0 0 0 0 1 0 0 14 0.013920 7 1 -1 -1 0 0 0 0 0 0 1 0 0 15 0.012705 8 -1 -1 1 0 0 0 0 0 0 0 1 0 16 0.015922 8 1 -1 -1 0 0 0 0 0 0 0 1 0 17 0.029408 9 -1 -1 1 0 0 0 0 0 0 0 0 1 18 0.035345 9 1 -1 -1 0 0 0 0 0 0 0 0 1 19 0.029816 10 -1 -1 1 0 0 0 0 0 0 0 0 0 20 0.035578 10 1 -1 -1 0 0 0 0 0 0 0 0 0 21 0.001265 1 -1 1 -1 1 0 0 0 0 0 0 0 0 22 0.001876 1 1 1 1 1 0 0 0 0 0 0 0 0 23 0.000129 2 -1 1 -1 0 1 0 0 0 0 0 0 0 24 0.000375 2 1 1 1 0 1 0 0 0 0 0 0 0 25 0.003254 3 -1 1 -1 0 0 1 0 0 0 0 0 0 26 0.005476 3 1 1 1 0 0 1 0 0 0 0 0 0 27 0.001700 4 -1 1 -1 0 0 0 1 0 0 0 0 0 28 0.003427 4 1 1 1 0 0 0 1 0 0 0 0 0 29 0.046672 5 -1 1 -1 0 0 0 0 1 0 0 0 0 30 0.072061 5 1 1 1 0 0 0 0 1 0 0 0 0 31 0.035250 6 -1 1 -1 0 0 0 0 0 1 0 0 0 32 0.053039 6 1 1 1 0 0 0 0 0 1 0 0 0 33 0.004848 7 -1 1 -1 0 0 0 0 0 0 1 0 0 34 0.007074 7 1 1 1 0 0 0 0 0 0 1 0 0 35 0.003417 8 -1 1 -1 0 0 0 0 0 0 0 1 0 36 0.005806 8 1 1 1 0 0 0 0 0 0 0 1 0 37 0.009366 9 -1 1 -1 0 0 0 0 0 0 0 0 1 38 0.015154 9 1 1 1 0 0 0 0 0 0 0 0 1 39 0.010971 10 -1 1 -1 0 0 0 0 0 0 0 0 0 40 0.016512 10 1 1 1 0 0 0 0 0 0 0 0 0 xxx <- read.table("u.dat") # Read u.dat into a matrix prob <- xxx[,2] # Puts col 2 into prob xxx <- as.matrix(xxx[,4:15]) # Just keep columns 4 through 15 (dummy vars) simdis <- function(p,values) # Simulate from discrete dist, prop to p { if(length(p)!=length(values)) stop("p & values must be same length!") nn <- length(p) ; cumprob <- cumsum(p)/sum(p) simdis <- min(values[cumprob>runif(1)]) simdis } # End of function simdis ncp <- function(eff,X,CC) # Returns noncentrality parameter for normal regression # eff - a dx1 vector (C Beta - h)/sigma # X - the usual nxp matrix # CC - a d x p matrix as in H0: C Beta = h {ncp <- t(eff)%*%solve(CC%*%solve(t(X)%*%X)%*%t(CC))%*%eff ncp} # Now do n=100, m=500 n <- 100 ; X <- numeric(n*12) ; dim(X) <- c(n,12) int <- numeric(n) + 1 m <- 500 # MC Sample Size effect <- cbind(c(.25,0)) cmat <- rbind(c(0,1,0,0,0,0,0,0,0,0,0,0,0), c(0,0,0,1,0,0,0,0,0,0,0,0,0)) Y <- numeric(m) fcrit <- qf(.95,2,n-13) ; z99 <- qnorm(.995) for(k in 1:m) { for(i in 1:n) X[i,] <- xxx[simdis(prob,1:40),] # Random row of xxx X <- cbind(int,X) ncent <- ncp(effect,X,cmat) Y[k] <- 1-pf(fcrit,2,n-13,ncent) }
This failed, probably because there were columns of all zeros. To be continued.