Monte Carlo
Let X be standard normal, and Y = sin(1/X). Find P(|Y>1/2|). First show the pieces, then put it together.
/dos/brunner/2201 > R > m <- 10 > Y <- sin(1/rnorm(m,0,1)) > Y [1] -0.9356780 -0.9127910 -0.6572394 0.5278282 -0.7293509 0.6476336 [7] -0.5021838 -0.1263424 0.4437484 -0.7486882 > abs(Y)>.5 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE > mean(abs(Y)>.5) [1] 0.8 > > # Now do it compactly > m <- 1000 ; phat <- mean(abs(sin(1/rnorm(m,0,1))>.5)) ; phat [1] 0.387 > # Now confidence interval, using normal approx to binomial > zcrit <- qnorm(.995) ; zcrit [1] 2.575829 > ci99 <- numeric(2) > ci99[1] <- phat - zcrit*sqrt(phat*(1-phat)/m) > ci99[2] <- phat + zcrit*sqrt(phat*(1-phat)/m) > phat ; ci99 [1] 0.387 [1] 0.3473263 0.4266737 > zcrit*sqrt(phat*(1-phat)/m) [1] 0.03967371
Question: What Monte Carlo sample size m is required to guarantee a 99% margin of error of 0.01 or less?
/dos/brunner/2201 > OSplus Warning: Cannot open audit file S-PLUS : Copyright (c) 1988, 1996 MathSoft, Inc. S : Copyright AT&T. Version 3.4 Release 1 for Silicon Graphics Iris, IRIX 5.3 : 1996 Working data will be in .Data > f <- function(x) + {f <- abs(cos(1/cos(x))) ; f } > X11() > plot(1:100,f(1:100),type='l') > integrate(f,0,100)$integral [1] 46.69788 Warning messages: subdivision limit reached in: integrate.f(f, lower, upper, subdivisions, rel.tol, abs.tol, keep.xy, aux) > integrate(f,0,100,subdivisions=1000)$integral [1] 47.07456 Warning messages: subdivision limit reached in: integrate.f(f, lower, upper, subdivisions, rel.tol, abs.tol, keep.xy, aux) >
So, trying Mathematica,
/dos/brunner/misc > math Mathematica 4.0 for Silicon Graphics Copyright 1988-1999 Wolfram Research, Inc. -- Motif graphics initialized -- In[1]:= Abs[Cos[1/Cos[x]]] Out[1]= Abs[Cos[Sec[x]]] In[2]:= NIntegrate[%,{x,0,100}] NIntegrate::slwcon: Numerical integration converging too slowly; suspect one of the following: singularity, value of the integration being 0, oscillatory integrand, or insufficient WorkingPrecision. If your integrand is oscillatory try using the option Method->Oscillatory in NIntegrate. NIntegrate::slwcon: Numerical integration converging too slowly; suspect one of the following: singularity, value of the integration being 0, oscillatory integrand, or insufficient WorkingPrecision. If your integrand is oscillatory try using the option Method->Oscillatory in NIntegrate. NIntegrate::slwcon: Numerical integration converging too slowly; suspect one of the following: singularity, value of the integration being 0, oscillatory integrand, or insufficient WorkingPrecision. If your integrand is oscillatory try using the option Method->Oscillatory in NIntegrate. General::stop: Further output of NIntegrate::slwcon will be suppressed during this calculation. NIntegrate::ncvb: NIntegrate failed to converge to prescribed accuracy after 7 recursive bisections in x near x = 55.0781. Out[2]= 41.0693 > Y <- 100 * f(runif(100000,0,100)) > mean(Y) [1] 47.16109 > var(Y) [1] 660.1925 > > zcrit <- qnorm(.995) ; ci99 <- numeric(2) > > ci99[1] <- mean(Y) - zcrit*sqrt(var(Y)/100000) > ci99[2] <- mean(Y) + zcrit*sqrt(var(Y)/100000) > mean(Y) ; ci99 [1] 47.16109 [1] 46.95180 47.37038
sum(1/(1:100000)^2) [1] 1.644924
I'd be very happy with 1.645
First, geometric(1/2). Stupid way is to flip a fair coin till first head.
dumbgeo <- function(n) { dumbgeo <- numeric(n) for(i in 1:n) { geo <- 1 while(runif(1)<.5) geo <- geo+1 dumbgeo[i] <- geo } dumbgeo } # End of function dumbgeo mean(dumbgeo(10000)) # around 2, ok
And now the nice new one. Explain.
rgeo <- function(n,theta) # P(H) = theta, how many tosses to get first head? { rgeo <- trunc(1 + log(runif(n))/log(1-theta)) rgeo } # End of function rgeo > help(unix.time) Execution Times DESCRIPTION: Returns a vector of 5 timings expressed in seconds: user, system and elapsed times in S-PLUS, and user and system times in child processes. USAGE: unix.time(expr) REQUIRED ARGUMENTS: expr: any S-PLUS expression. > dtime <- unix.time(g1 <- dumbgeo(10000)) > stime <- unix.time(g2 <- rgeo(10000,.5)) > dtime ; stime [1] 9.60 0.18 10.00 0.00 0.00 [1] 0.03999996 0.00000000 0.00000000 0.00000000 0.00000000
It took about 240 times as long in terms of user time. That was on credit. Now utstat.
OSplus > dtime ; stime [1] 8.360001 0.980000 10.000000 0.000000 0.000000 [1] 0.02999973 0.00999999 0.00000000 0.00000000 0.00000000 New Splus > dtime ; stime [1] 13.72 0.40 14.00 0.00 0.00 [1] 0.03 0.00 0.00 0.00 0.00 R > dtime ; stime [1] 4.64 0.64 6.00 0.00 0.00 [1] 0.03 0.01 0.00 0.00 0.00
credit load was: load average: 0.12, 0.14, 0.18
utstat load average was: load average: 1.78, 1.79, 2.07
m <- 1000 > grv <- rgeo(m,.5) > Y <- 2^grv/grv^2 ; mean(Y) ; var(Y) > [1] 1.611057 > [1] 2.547996
But even better should be
> Z <- 1/(trunc(1/runif(n))) ; mean(Z) ; var(Z) > [1] 0.5870525 > [1] 0.1368564
And estimated relative efficiency of Z-bar (both are divided by m) is
> var(Y)/var(Z) [1] 18.61803 > > # Now how about if m = 10,000? m <- 10000 ; grv <- rgeo(m,.5) Y <- 2^grv/grv^2 Z <- 1 + 1/(trunc(1/runif(n))) mean(Y) ; mean(> Z) var(Y)/var(Z) > > > > [1] 1.573374 > [1] 1.638782 > [1] 11.00586 >
For n = 100,000,
[1] 1.607312 [1] 1.645167 [1] 101.7858
Our variance estimate is not very stable. Tried n = one million, got
1.596673 1.64497 257.1023
It's irresponsible to try 10 million, but I did.
More interesting is this question. What is going on here? why are we unable to estimate relatve efficiency?