Intro

This is my (not very good) code for looking at the Pareto distribution. The cdf is \(F(x) = 1- (x/\nu)^{-\theta}\mathbb{1}(x \ge \nu)\). The parameter \(\theta\) controls the shape, and \(\nu\) controls the scale. (Or at least they are called the shape and scale parameters in most books.)

Paretocdf <- function(x, shape = 1, scale = 1){
  ifelse(x >= scale, 1-(x/scale)^(-shape), 0)
}

Differentiating this with respect to \(x\) gives the density, as in MS I HW3:

Paretopdf <- function(x, shape=1, scale=1){
  ifelse(x >= scale, shape*(scale^shape)/(x^(shape+1)), 0)
}

Sanity check:

  Paretocdf(5,1,1); Paretopdf(5,1,1)
## [1] 0.8
## [1] 0.04

A Few Plots

First the CDF:

xvals <- seq(1, 10, length=100)
plot(xvals, Paretocdf(xvals, shape = 1, scale = 1), type = "l",
     xlim = c(1,10), ylim=c(0,1), ylab="CDF", xaxt="n", yaxt="n")
axis(1, seq(1,10,3))
axis(2, seq(0,1,.25))
lines(xvals, Paretocdf(xvals, shape = 2, scale = 1), col="blue")
lines(xvals, Paretocdf(xvals, shape = 3, scale = 1), col= "red")
abline(h=1, lty = 3)

Now the PDF:

xvals <- seq(1, 10, length=100)
plot(xvals, Paretopdf(xvals, shape = 1, scale = 1), type = "l",
     xlim = c(1,10), ylim=c(0,3), ylab="PDF", xaxt="n", yaxt="n")
axis(1, seq(1,10,3))
axis(2, seq(0,3,1))
lines(xvals, Paretopdf(xvals, shape = 2, scale = 1), col="blue")
lines(xvals, Paretopdf(xvals, shape = 3, scale = 1), col= "red")
abline(v=1, lty = 3)

It’s hard to see in this picture, but the tail for shape = 1 is quite long; see the corresponding CDF plot where it is not close to 1 even at \(x=10\).

The expected value is \(E(X) = \theta*\nu/(\theta - 1)\), and the variance is \((\theta \nu^2)/\{(\theta-1)^2(\theta-2)\}\). I’ll add to the density plot the normal density with this expected value and variance, for \(\nu=1\) and \(\theta = 3\). (Note that the mean and variance do not exist when \(\theta \leq 1\) and \(\theta \leq 2\), respectively.)

par(mfrow = c(1,2), mar = c(2,2,1,1))
plot(xvals, Paretopdf(xvals, shape = 1, scale = 1), type = "l",
          xlim = c(1,10), ylim=c(0,3), ylab="PDF", xaxt="n", yaxt="n")
axis(1, seq(1,10,3))
axis(2, seq(0,3,1))
lines(xvals, Paretopdf(xvals, shape = 2, scale = 1), col = "blue")
lines(xvals, Paretopdf(xvals, shape = 3, scale = 1), col = "red")
abline(v=1, lty = 3)
lines(xvals, dnorm(xvals, mean = 3/2, sd = sqrt(3/(2^2))), col = "orange", lty= 2)

plot(xvals, Paretopdf(xvals, shape = 1, scale = 1), type = "l",
     ylab="", xlim = c(8,10), ylim = c(0,.02), xaxt="n", yaxt="n")
axis(1, seq(8,10,1))
axis(2, seq(0,.02,.01))
lines(xvals, Paretopdf(xvals, shape = 2, scale = 1), col = "blue")
lines(xvals, Paretopdf(xvals, shape = 3, scale = 1), col = "red")
abline(v=1, lty = 3)
lines(xvals, dnorm(xvals, mean = 3/2, sd = sqrt(3/(2^2))), col = "orange", lty= 2)

Likelihood Inference (corrected Jan 19 by SP)

Checking now whether what I told you in class on Jan.15 was correct. To compute the maximum likelihood estimate of \(\theta\) and \(\nu\), we want to maximize \[ \ell(\theta,\nu;\boldsymbol{x}) = \{n\log(\theta) + n\theta\log(\nu) -\theta \Sigma\log(x_i)\}\mathbb{1}(\nu \le x_{(1)}), \] where I’m using the notation \(x_{(1)} = \min(x_1, \dots, x_n)\).

Profile likelihood for \(\nu\)

If \(\nu\) is known, it’s easy to see that the maximum likelihood estimate of \(\theta\) is \[ \hat\theta_{\nu} = \left\{ \frac{1}{n}\Sigma \log(x_i) - \log(\nu)\right\}^{-1} = \{t-\log(\nu)\}^{-1}, \] say, as long as \(x_{(1)} \ge \nu\). If \(\nu\) is unknown, then this defines the constrained m.l.e. \(\hat\theta_\nu\), and the profile log-likelihood is \[\begin{eqnarray*} \ell_{\text p}(\nu;\boldsymbol{x}) = \ell(\hat\theta_{\nu},\nu;\boldsymbol{x}) &=& n\log(\hat\theta_\nu) + n\hat\theta_\nu\log(\nu) -\hat\theta_\nu\Sigma\log(x_i) \\ &=& -n\log\{t-\log(\nu)\} + \frac{n\log(\nu)}{t-\log(\nu)} - \frac{nt}{t-\log(\nu)} \\ &=& -n\left\{ \log\{t-\log(\nu)\} - \frac{\log(\nu)-t}{t-\log(\nu)} \right\}\\ &=& -n[\log\{t-\log(\nu)\} + 1], \quad \nu\le x_{(1)}. \end{eqnarray*}\] The function \(\ell_{\text p}(\nu)\) is increasing in \(\nu\) (see the end of the document), so the maximum likelihood estimate would be \(\hat\nu = x_{(1)}\), and then the maximum likelihood estimate of \(\theta\) would be \(\hat\theta = \hat\theta_{\hat\nu} = \{t-log(x_{(1)})\}^{-1}\).

Profile likelihood for \(\theta\)

The other way around, that is treating \(\nu\) as a nuisance parameter, is less interesting. Indeed, for a fixed \(\theta > 0\), the likelihood \(\ell(\theta,\nu;\boldsymbol{x})\) is increasing in \(\nu\) on \((0,x_{(1)}]\) and is zero for any other values of \(\nu\). Hence, it’s straightforward from the likelihood itself that the constrained m.l.e. \(\hat\nu_{\theta}\) of \(\nu\) is simply \(\hat\nu_\theta = x_{(1)} = \hat\nu\), no matter the value of \(\theta\). The rest is routine. We directly obtain that the profile likelihood for \(\theta\) is \[ \ell_p(\theta;\boldsymbol{x}) = \ell(\theta,\hat\nu_\theta;\boldsymbol{x}) = n\log(\theta) + n\theta\log(x_{(1)}) -\theta \Sigma_i\log(x_i). \] Differentiating this once gives \(n/\theta + n\log(x_{(1)}) - \sum_i \log(x_i)\); setting this to zero and solving for \(\theta\) thus leads to \(\hat\theta\) as before.

Double check

I just discovered this page about using the Pareto in R, (which would have saved me the coding above): they use \(\eta\) instead of \(\nu\) and call it the location instead of the scale. Everything else seems the same though.

library(EnvStats)
## 
## Attaching package: 'EnvStats'
## The following objects are masked from 'package:stats':
## 
##     predict, predict.lm
## The following object is masked from 'package:base':
## 
##     print.default
plot(xvals, dpareto(xvals, location = 1, shape = 1), type="l", ylim=c(0,3), ylab="pdf")
lines(xvals, dpareto(xvals, location = 1, shape = 2), col="blue")
lines(xvals, dpareto(xvals, location = 1, shape = 3), col="red")

Looks like my code for the density was okay. Now I’ll try plotting the profile log-likelihood function:

x <- rpareto(30, location = 1, shape = 2) 
prof <- function(v, x){
  n <- length(x)
  -n*(log(sum(log(x))/n-log(v)) + 1)
}
vvals <- seq(.1,min(x),length=50)
profvals <- prof(vvals, x)
plot(vvals, profvals, type="l", ylab = "Profile log-likelihood", xlab = expression(nu))

The profile log-likelihood is indeed monotone in \(\nu\), so maximized at the boundary. Solving \(\ell_{\text p}'(\nu) = 0\) will not work. And \(\hat\nu\) is unlikely to follow a normal distribution, even asymptotically.