a5

1. Implement Bagging on the data generating model of Example 2.8 (use the file Example2.8_Bootstrap.html): (1) set parameter sigma=0.5, N=50; (2) build and visualize one tree model (you can use any software package) on one data sample; (3) manually generate 1000 bootstrap samples and build a bagging model, visualize the resulted prediction on test data, and measure the MSE, Bias, and Variance.

Using the data generating model of Example 2.8, as well as creating out of sample data, we have the following code.

#Data generating model
set.seed(135790)
# static parameters
x.min <- -1
x.max <- 1
sigma <- 0.5
# function for generating data
data.gen <- function(N=2) {
  x <- runif(n=N, min=x.min, max=x.max)
  f.x <- sin(pi * x)
  epsilon <- rnorm(n=N, mean=0, sd=sigma)
  y <- f.x + epsilon
  return(data.frame(x=x, y=y))
}

#Out of Sample data
M <- 10000
x.out <- sort(runif(n=M, min=x.min, max=x.max))  # order the data points by x for easy plots
f.x.out <- sin(pi * x.out)
y.out <- f.x.out + rnorm(n=M, mean=0, sd=sigma)
x.test=data.frame(x=x.out)

Generating one data sample size N=50, sigma = 0.5

N=50;
data0 = data.gen(N=N);

Build one tree using ‘tree’ library.

#Buiding a tree
tree1 <- tree(y ~ x, data=data0)
summary(tree1)

## 
## Regression tree:
## tree(formula = y ~ x, data = data0)
## Number of terminal nodes:  5 
## Residual mean deviance:  0.2014 = 9.064 / 45 
## Distribution of residuals:
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.91150 -0.26760 -0.03412  0.00000  0.25330  1.20400

and plot decision tree…

# plot the fitted tree
plot(tree1)
text(tree1)

Plot the partitions

# plot the partition
plot(data0$x, data0$y)
x <- data0$x[order(data0$x)]
f.x <- sin(pi * x)
lines(x, f.x, lwd=2)
partition.tree(tree1, lwd=2, col="red", add=T)

bootstrapping

function for generating Bootstrap data sample

data.bootstrap <- function(data) {
  index <- sample(N, N, replace=TRUE)
  return(data[index, ])
}

generating 1000 trees and getting the predictions

RUN=1000;
predictions = matrix(rep(0,RUN*M),nrow = RUN, ncol=M)
#generate 1000 bootstrap samples and getting predictions
for (run in 1:RUN) {
  data <- data.bootstrap(data = data0);
  treeBS = tree(y~x,data=data);
  predictions[run,] = predict(treeBS,x.test);
}
y.test = colMeans(predictions)

visualizing the non-random forest

test = data.frame(x=x.test, y=y.test)
plot(test$x, test$y);
lines(x.out, f.x.out, lwd=2);

Measuring MSE, Bias, and Variance

g_D_x <- predictions;
g_bar_x <-colMeans(g_D_x)

bias = mean((g_bar_x-f.x.out)^2);
var = mean((t(g_D_x)-g_bar_x)^2);
MSE = mean((t(g_D_x)-y.out)^2);

The final numbers are as follows.

Bias	Variance	MSE
0.0676821	0.0616438	0.3769345

2. Show that as the number of bootstrap samples B gets large, the OOB error estimate for a random forest approaches its LOOCV error estimate, and that in the limit, the identity is exact.

We first prove that each bootstrap sample takes about \(2/3\) of the original sample.

The probability of an observation \((x_i,y_i)\) not being selected from selecting from sample of size \(N\) is \((1-\frac{1}{N})\). For a bootstrap sample of size \(N\), the probability of not selecting \((x_i,y_i)\) is \((1-\frac{1}{N})^N\rightarrow\frac{1}{e}\approx\frac{1}{3}\) as \(N\rightarrow\infty\). Hence, each bootstrap sample takes about \(1-\frac{1}{3}=\frac{2}{3}\) of the observations.

Let \(X\) be a sample of size \(N\). Let \(X_t\) be a sample drawn with repetition from \(X\), used to build tree \(t\), then \(X_t\) is a bootstrap sample of \(X\) for tree \(t\). Let \(O_t = X \setminus X_t\) be the set of out-of-bag elements of tree \(t\). Then we have

\[\begin{align*} \text{OOB Error}&=\frac{\sum\limits_{t=1}^{B}\sum\limits_{i=1}^{|O_t|} Error(y_i,f_t(x_i))}{\sum\limits_{t=1}^{B}|O_t|} \end{align*}\]

To compute the LOOCV, we partition \(X\) into \(N\) partitions \(X=\{X_1,X_2,\dots,X_N\}\), \(X_i\cap X_j=\phi\), \(|X_i|=1\) since the sample size is \(N\). Let \(S_t=X\setminus{X_t}\) be the sample used for creating the random forest \(t\). Then we have

\[\begin{align*} \text{LOOCV error}=\frac{\sum\limits_{t=1}^{N}Error(y_t,f_{S_t}(x_t))}{N} \end{align*}\] We also proof that with probability 1 each element is selected in at least one of the bootstrap samples. The probability of an observation \((x_i,y_i)\) not being selected for a particular bootstrap sample is \((1-\frac{1}{N})^N\). Hence, for \(B\) number of bootstrap samples, the probability of \((x_i,y_i)\) not being selected at all is \((1-\frac{1}{N})^{NB}\rightarrow0\) as \(B\rightarrow\infty\). Now we are ready to show the convergence of the OOB Error. \[\begin{align*} \lim_{B\rightarrow\infty}\text{OOB Error}&=\frac{\lim_{B\rightarrow\infty}\sum\limits_{t=1}^{B}\sum\limits_{i=1}^{|O_t|} Error(y_i,f_t(x_i))}{\lim_{B\rightarrow\infty}\sum\limits_{t=1}^{B}|O_t|}\\ &=\frac{\frac{1}{3}\sum\limits_{i=1}^{N} Error(y_i,f_t(x_i))}{\frac{1}{3}N}\\ &=\frac{\sum\limits_{i=1}^{N} Error(y_i,f_t(x_i))}{N} \end{align*}\]

a5

Tan Hong Ming

October 29, 2017