Monte Carlo


Approximating Probabilities

Let X be standard normal, and Y = sin(1/X). Find P(|Y>1/2|). First show the pieces, then put it together.


/dos/brunner/2201 > R


> m <- 10
> Y <- sin(1/rnorm(m,0,1))
> Y
 [1] -0.9356780 -0.9127910 -0.6572394  0.5278282 -0.7293509  0.6476336
 [7] -0.5021838 -0.1263424  0.4437484 -0.7486882
> abs(Y)>.5
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE
> mean(abs(Y)>.5)
[1] 0.8
>
> # Now do it compactly
> m <- 1000 ; phat <- mean(abs(sin(1/rnorm(m,0,1))>.5)) ; phat
[1] 0.387
> # Now confidence interval, using normal approx to binomial
> zcrit <- qnorm(.995) ; zcrit
[1] 2.575829
> ci99 <- numeric(2)
> ci99[1] <- phat - zcrit*sqrt(phat*(1-phat)/m)
> ci99[2] <- phat + zcrit*sqrt(phat*(1-phat)/m)
> phat ; ci99
[1] 0.387
[1] 0.3473263 0.4266737
> zcrit*sqrt(phat*(1-phat)/m)
[1] 0.03967371

Question: What Monte Carlo sample size m is required to guarantee a 99% margin of error of 0.01 or less?

Approximating More General Integrals


/dos/brunner/2201 > OSplus
Warning: Cannot open audit file
S-PLUS : Copyright (c) 1988, 1996 MathSoft, Inc.
S : Copyright AT&T.
Version 3.4 Release 1 for Silicon Graphics Iris, IRIX 5.3 : 1996 
Working data will be in .Data 

> f <- function(x)
+    {f <- abs(cos(1/cos(x))) ; f }
> X11()
> plot(1:100,f(1:100),type='l')

> integrate(f,0,100)$integral
[1] 46.69788
Warning messages:
  subdivision limit reached in: integrate.f(f, lower, upper, subdivisions,
        rel.tol, abs.tol, keep.xy, aux)
> integrate(f,0,100,subdivisions=1000)$integral
[1] 47.07456
Warning messages:
  subdivision limit reached in: integrate.f(f, lower, upper, subdivisions,
        rel.tol, abs.tol, keep.xy, aux)
>

So, trying Mathematica,


/dos/brunner/misc > math
Mathematica 4.0 for Silicon Graphics
Copyright 1988-1999 Wolfram Research, Inc.
 -- Motif graphics initialized --

In[1]:= Abs[Cos[1/Cos[x]]]

Out[1]= Abs[Cos[Sec[x]]]

In[2]:= NIntegrate[%,{x,0,100}]

NIntegrate::slwcon:
   Numerical integration converging too slowly; suspect one of the following:
    singularity, value of the integration being 0, oscillatory integrand, or
    insufficient WorkingPrecision. If your integrand is oscillatory try using
    the option Method->Oscillatory in NIntegrate.

NIntegrate::slwcon:
   Numerical integration converging too slowly; suspect one of the following:
    singularity, value of the integration being 0, oscillatory integrand, or
    insufficient WorkingPrecision. If your integrand is oscillatory try using
    the option Method->Oscillatory in NIntegrate.

NIntegrate::slwcon:
   Numerical integration converging too slowly; suspect one of the following:
    singularity, value of the integration being 0, oscillatory integrand, or
    insufficient WorkingPrecision. If your integrand is oscillatory try using
    the option Method->Oscillatory in NIntegrate.

General::stop: Further output of NIntegrate::slwcon
     will be suppressed during this calculation.

NIntegrate::ncvb:
   NIntegrate failed to converge to prescribed accuracy after 7
     recursive bisections in x near x = 55.0781.

Out[2]= 41.0693

>  Y <- 100 * f(runif(100000,0,100))             
> mean(Y)
[1] 47.16109
> var(Y)
[1] 660.1925
>
>  zcrit <- qnorm(.995) ; ci99 <- numeric(2)
> 
> ci99[1] <- mean(Y) - zcrit*sqrt(var(Y)/100000)
> ci99[2] <- mean(Y) + zcrit*sqrt(var(Y)/100000)
> mean(Y) ; ci99
[1] 47.16109
[1] 46.95180 47.37038

Variance Reduction


sum(1/(1:100000)^2)
[1] 1.644924

I'd be very happy with 1.645

First, geometric(1/2). Stupid way is to flip a fair coin till first head.


dumbgeo <- function(n)
    {
    dumbgeo <- numeric(n)
    for(i in 1:n)
        {
        geo <- 1
        while(runif(1)<.5) geo <- geo+1
        dumbgeo[i] <- geo
        }
     dumbgeo   
    } # End of function dumbgeo
    
    
mean(dumbgeo(10000)) # around 2, ok

And now the nice new one. Explain.


rgeo <- function(n,theta) # P(H) = theta, how many tosses to get first head?
    {
    rgeo <- trunc(1 + log(runif(n))/log(1-theta))
    rgeo
    } # End of function rgeo

> help(unix.time)

Execution Times

DESCRIPTION:
       Returns a vector of 5 timings expressed in seconds:  user,
       system  and  elapsed  times in S-PLUS, and user and system
       times in child processes.

USAGE:
       unix.time(expr)

REQUIRED ARGUMENTS:
expr:     any S-PLUS expression.


> dtime <- unix.time(g1 <- dumbgeo(10000))
> stime <- unix.time(g2 <- rgeo(10000,.5))
> dtime ; stime
[1]  9.60  0.18 10.00  0.00  0.00
[1] 0.03999996 0.00000000 0.00000000 0.00000000 0.00000000

It took about 240 times as long in terms of user time. That was on credit. Now utstat.


OSplus

> dtime ; stime
[1]  8.360001  0.980000 10.000000  0.000000  0.000000
[1] 0.02999973 0.00999999 0.00000000 0.00000000 0.00000000

New Splus

> dtime ; stime
[1] 13.72  0.40 14.00  0.00  0.00
[1] 0.03 0.00 0.00 0.00 0.00

R

> dtime ; stime
[1] 4.64 0.64 6.00 0.00 0.00
[1] 0.03 0.01 0.00 0.00 0.00

credit load was: load average: 0.12, 0.14, 0.18

utstat load average was: load average: 1.78, 1.79, 2.07


 m <- 1000    
> grv <- rgeo(m,.5)
> Y <- 2^grv/grv^2 ; mean(Y) ; var(Y)
> [1] 1.611057
> [1] 2.547996

But even better should be


> Z <- 1/(trunc(1/runif(n))) ; mean(Z) ; var(Z)
> [1] 0.5870525
> [1] 0.1368564

And estimated relative efficiency of Z-bar (both are divided by m) is


> var(Y)/var(Z)
[1] 18.61803

> 
> # Now how about if m = 10,000?
m <- 10000 ; grv <- rgeo(m,.5) 
Y <- 2^grv/grv^2 
Z <- 1 + 1/(trunc(1/runif(n))) 
mean(Y) ; mean(> Z)
var(Y)/var(Z)
> > > > [1] 1.573374
> [1] 1.638782
> [1] 11.00586
> 

For n = 100,000,

[1] 1.607312
[1] 1.645167
[1] 101.7858

Our variance estimate is not very stable. Tried n = one million, got

1.596673
1.64497
257.1023

It's irresponsible to try 10 million, but I did.

More interesting is this question. What is going on here? why are we unable to estimate relatve efficiency?