Monte Carlo
Let X be standard normal, and Y = sin(1/X). Find P(|Y>1/2|). First show the pieces, then put it together.
/dos/brunner/2201 > R > m <- 10 > Y <- sin(1/rnorm(m,0,1)) > Y [1] -0.9356780 -0.9127910 -0.6572394 0.5278282 -0.7293509 0.6476336 [7] -0.5021838 -0.1263424 0.4437484 -0.7486882 > abs(Y)>.5 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE > mean(abs(Y)>.5) [1] 0.8 > > # Now do it compactly > m <- 1000 ; phat <- mean(abs(sin(1/rnorm(m,0,1))>.5)) ; phat [1] 0.387 > # Now confidence interval, using normal approx to binomial > zcrit <- qnorm(.995) ; zcrit [1] 2.575829 > ci99 <- numeric(2) > ci99[1] <- phat - zcrit*sqrt(phat*(1-phat)/m) > ci99[2] <- phat + zcrit*sqrt(phat*(1-phat)/m) > phat ; ci99 [1] 0.387 [1] 0.3473263 0.4266737 > zcrit*sqrt(phat*(1-phat)/m) [1] 0.03967371
Question: What Monte Carlo sample size m is required to guarantee a 99% margin of error of 0.01 or less?
/dos/brunner/2201 > OSplus
Warning: Cannot open audit file
S-PLUS : Copyright (c) 1988, 1996 MathSoft, Inc.
S : Copyright AT&T.
Version 3.4 Release 1 for Silicon Graphics Iris, IRIX 5.3 : 1996
Working data will be in .Data
> f <- function(x)
+ {f <- abs(cos(1/cos(x))) ; f }
> X11()
> plot(1:100,f(1:100),type='l')
> integrate(f,0,100)$integral
[1] 46.69788
Warning messages:
subdivision limit reached in: integrate.f(f, lower, upper, subdivisions,
rel.tol, abs.tol, keep.xy, aux)
> integrate(f,0,100,subdivisions=1000)$integral
[1] 47.07456
Warning messages:
subdivision limit reached in: integrate.f(f, lower, upper, subdivisions,
rel.tol, abs.tol, keep.xy, aux)
>
So, trying Mathematica,
/dos/brunner/misc > math
Mathematica 4.0 for Silicon Graphics
Copyright 1988-1999 Wolfram Research, Inc.
-- Motif graphics initialized --
In[1]:= Abs[Cos[1/Cos[x]]]
Out[1]= Abs[Cos[Sec[x]]]
In[2]:= NIntegrate[%,{x,0,100}]
NIntegrate::slwcon:
Numerical integration converging too slowly; suspect one of the following:
singularity, value of the integration being 0, oscillatory integrand, or
insufficient WorkingPrecision. If your integrand is oscillatory try using
the option Method->Oscillatory in NIntegrate.
NIntegrate::slwcon:
Numerical integration converging too slowly; suspect one of the following:
singularity, value of the integration being 0, oscillatory integrand, or
insufficient WorkingPrecision. If your integrand is oscillatory try using
the option Method->Oscillatory in NIntegrate.
NIntegrate::slwcon:
Numerical integration converging too slowly; suspect one of the following:
singularity, value of the integration being 0, oscillatory integrand, or
insufficient WorkingPrecision. If your integrand is oscillatory try using
the option Method->Oscillatory in NIntegrate.
General::stop: Further output of NIntegrate::slwcon
will be suppressed during this calculation.
NIntegrate::ncvb:
NIntegrate failed to converge to prescribed accuracy after 7
recursive bisections in x near x = 55.0781.
Out[2]= 41.0693
> Y <- 100 * f(runif(100000,0,100))
> mean(Y)
[1] 47.16109
> var(Y)
[1] 660.1925
>
> zcrit <- qnorm(.995) ; ci99 <- numeric(2)
>
> ci99[1] <- mean(Y) - zcrit*sqrt(var(Y)/100000)
> ci99[2] <- mean(Y) + zcrit*sqrt(var(Y)/100000)
> mean(Y) ; ci99
[1] 47.16109
[1] 46.95180 47.37038
sum(1/(1:100000)^2) [1] 1.644924
I'd be very happy with 1.645
First, geometric(1/2). Stupid way is to flip a fair coin till first head.
dumbgeo <- function(n)
{
dumbgeo <- numeric(n)
for(i in 1:n)
{
geo <- 1
while(runif(1)<.5) geo <- geo+1
dumbgeo[i] <- geo
}
dumbgeo
} # End of function dumbgeo
mean(dumbgeo(10000)) # around 2, ok
And now the nice new one. Explain.
rgeo <- function(n,theta) # P(H) = theta, how many tosses to get first head?
{
rgeo <- trunc(1 + log(runif(n))/log(1-theta))
rgeo
} # End of function rgeo
> help(unix.time)
Execution Times
DESCRIPTION:
Returns a vector of 5 timings expressed in seconds: user,
system and elapsed times in S-PLUS, and user and system
times in child processes.
USAGE:
unix.time(expr)
REQUIRED ARGUMENTS:
expr: any S-PLUS expression.
> dtime <- unix.time(g1 <- dumbgeo(10000))
> stime <- unix.time(g2 <- rgeo(10000,.5))
> dtime ; stime
[1] 9.60 0.18 10.00 0.00 0.00
[1] 0.03999996 0.00000000 0.00000000 0.00000000 0.00000000
It took about 240 times as long in terms of user time. That was on credit. Now utstat.
OSplus > dtime ; stime [1] 8.360001 0.980000 10.000000 0.000000 0.000000 [1] 0.02999973 0.00999999 0.00000000 0.00000000 0.00000000 New Splus > dtime ; stime [1] 13.72 0.40 14.00 0.00 0.00 [1] 0.03 0.00 0.00 0.00 0.00 R > dtime ; stime [1] 4.64 0.64 6.00 0.00 0.00 [1] 0.03 0.01 0.00 0.00 0.00
credit load was: load average: 0.12, 0.14, 0.18
utstat load average was: load average: 1.78, 1.79, 2.07
m <- 1000 > grv <- rgeo(m,.5) > Y <- 2^grv/grv^2 ; mean(Y) ; var(Y) > [1] 1.611057 > [1] 2.547996
But even better should be
> Z <- 1/(trunc(1/runif(n))) ; mean(Z) ; var(Z) > [1] 0.5870525 > [1] 0.1368564
And estimated relative efficiency of Z-bar (both are divided by m) is
> var(Y)/var(Z) [1] 18.61803 > > # Now how about if m = 10,000? m <- 10000 ; grv <- rgeo(m,.5) Y <- 2^grv/grv^2 Z <- 1 + 1/(trunc(1/runif(n))) mean(Y) ; mean(> Z) var(Y)/var(Z) > > > > [1] 1.573374 > [1] 1.638782 > [1] 11.00586 >
For n = 100,000,
[1] 1.607312 [1] 1.645167 [1] 101.7858
Our variance estimate is not very stable. Tried n = one million, got
1.596673 1.64497 257.1023
It's irresponsible to try 10 million, but I did.
More interesting is this question. What is going on here? why are we unable to estimate relatve efficiency?