Monte Carlo Integration

9 min readJan 13, 2021

Solving complex integrals is a necessity for many numerically-reliant fields like photonics, economics, video game development, and engineering.

The unfortunate truth is that many interesting problems have integrals that cannot be solved analytically, so alternative numerical methods must be applied to find appropriate estimates. It’s important to note that by applying a numerical method, such as Monte Carlo Integration, we are not “solving” the integral, but rather arriving at an appropriate estimate of the integral’s value. For many applications, this discrepancy can be ignored, but the distinction should be kept in mind particularly when considering what degree of accuracy is required for the problem at hand.

The modern variant of the “Monte Carlo Methods” can be traced back to the Los Alamos Laboratory in the 1940s, where it was originally developed to assist with modeling the nuclear fission process, in particular, simulating the mean free path of neutrons in fissile materials. Rather than solve the diffusion path of neutrons deterministically, a statistical sampling approach was applied — and it worked brilliantly. Since then, the term “Monte Carlo Method” has grown to take on a wide range of meanings depending on what field in which it is being applied. However, all applications of Monte Carlo have in common the basic principle of using statistical sampling to solve a deterministically difficult-to-solve problem.

Monte Carlo Integrator: Estimator

Firstly, we can examine the expected value of an integral using Monte Carlo. Traditionally, the expected value of a function g(x) can be calculated by first multiplying by its probability density function, f(x), and taking the integral over the desired region:

Alternatively, we can use a Monte Carlo approximation for the expected value by repeatedly sampling a uniform distribution between the limits of integration.

As noted, xi is a value that is sampled from a uniform distribution between the limits a and b for each unique n = 1,2,3, etc. This approach samples the f(x) function and uses the law of large numbers to find a converged expected value.

As an aside, the multiplicative factor of 1/n is sometimes given as 1/(n-1) because there are truly n-1 degrees of freedom with n-samples, but when n is large enough the difference between 1/n and 1/(n-1) is negligible.

Given the form of the estimator for the expected value, extending to the estimate of the integral is simple. The expected value formula is multiplied by the range of the integration limits, as shown below.

The integral estimate uses the expected value estimator along with a rectangular width determined by the integration limits to find approximate the integral’s area/volume.

We can test it on a relatively simple example, take the integral:

The analytical solution is roughly ~13.340. We can write a brief c++ program to apply the Monte Carlo Integration technique with a sample size of n = 200.

Which should print something close to:

Estimate for 1.0 -> 5.0 is 13.28, (200 iterations)

An estimate of 13.28 isn’t far off from the expected analytical solution of ~13.34, especially considering that there was only a sample size of n=200.

But how accurately is the Monte Carlo Integration method performing? To answer that, we have to look at the variance that is implicit to the Monte Carlo Integration Technique.

Monte Carlo Integration: Variance

The variance of the Monte Carlo Integration scheme follows a traditional process of calculating variance around some random variable. If we continue with the previous notation of taking the integral of a function g(x), the variance around the expected value of the integral of g(x) can be given a

For the sake of brevity, I am going to skip the derivation for the standard deviation relationship inherent to the Monte Carlo Integration scheme. If we continue with the notation of taking the integral of a function g(x), and the expected value of the integral being E[g(x)], the relationship for standard deviation can be given as:

Here the additional V term represents the total volume of the integration limits. If we were to evaluate a 1-dimensional integral between the limits of a to b then the equation could be written as:

Using this format, we can easily calculate standard deviation or variance (standard deviation² = variance) along with the integration estimate. One of the most obvious implications of the error equation is that the standard deviation of the Monte Carlo Integration estimator is inversely proportional to the square root of the number of samples.

So we have an idea of how much to increase the sample size to reduce the error of the estimate by a desired amount. For example, a two-fold reduction in error requires a four-fold increase in the number of samples!

So error reduction isn’t a very forgiving process, particularly in comparison to other numerical integration schemes. However, other numerical techniques often suffer from the curse of dimensionality, meaning that as you increase the dimension of the integral the error function becomes exponentially worse. In contrast, Monte Carlo Integration retains the 1/sqrt(n) error scaling regardless of the number of dimensions of the integral, making it ideal for high-dimension scaling.

To demonstrate the proportionality of the estimated error to the sample size, we can use the same example integral as before while also calculating the standard deviation of increasingly large sample sizes. We can try calculating n=8, 32, 128, 512, 2048 (a 4-fold increase in sample size each time…)

Which should result in something near:

Estimate for 1.0 -> 5.0 is 8.365, STD = 2.6452, (8 iterations)Estimate for 1.0 -> 5.0 is 13.548, STD = 1.1617, (32 iterations)Estimate for 1.0 -> 5.0 is 13.171, STD = 0.4994, (128 iterations)Estimate for 1.0 -> 5.0 is 13.235, STD = 0.2542, (512 iterations)Estimate for 1.0 -> 5.0 is 13.375, STD = 0.1243, (2048 iterations)

So as expected a 4-fold increase in the sample size produced a 2-fold decrease in the estimator’s error!

Variance Reduction

The core concept of Monte Carlo Integration is fairly straightforward and many of the modifications of the algorithm are intended to reduce the statistical error or number of samples needed to achieve an accurate estimate — these methods are generally lumped into the category of variance reduction. I won’t go into detail about each method, but having an idea of the different types of optimization procedures available can help figure out which one is appropriate to use.

Importance Sampling

Previously, we sampled from a random uniform distribution between the integration limits. Oftentimes functions have regions with ‘less significant’ data values that may play an insignificantly small role in the total sum of the integral’s value. If we were able to adjust the sampled distribution from a uniform distribution to a probability density function that somewhat resembles the function of interest, we can more quickly arrive at a statistically reliable integration estimate and reduce the sampling variance. Choosing a companion function to sample from isn’t necessarily straightforward, and sometimes requires prior knowledge of the function being integrated.

Once an appropriate companion probability density function is chosen (e.g. p(x)), the xi values are sampled from the companion distribution and the integration estimate is calculated using a multiplicative factor that accounts for the new sampling bias. The multiplicative factor is sometimes called the importance weight. The importance weight ensures that biased samples still drive the estimate towards the true expected value. The general expression for the Monte Carlo estimate using importance sampling is:

If we assume that we’re moving from uniform distribution sampling using the integration limits of a to b, we can further simplify the expression by using the probability density function of a uniform distribution as PDF = 1/(max-min).

For example, consider the function given below and its corresponding graph. Assume that we want to find an accurate expected value or integration estimate in the range of x=[0,6].

If we use brute force Monte Carlo Integration, the samples taken from the ranges of x=[0,10] and x=[4,6] wouldn’t provide as much information as the x=[2,4] range where the integrand is high. A companion function to sample random values must match the shape of g(x) in terms of having a high sample probability between x=[2,4] and decaying outwards. So we can use, for instance, a normal distribution with μ = 3 and σ = 1.

We can show how sampling from the companion function can reduce the error by comparing the performance of normal Monte Carlo Integration to Monte Carlo integration enhanced with importance sampling:

We can use the printed output to plot the error vs. number of iterations and see that by using importance sampling we can reduce the standard deviation of the sampling in all cases.

Importance sampling is a non-trivial exercise, especially when little is known about the form of the function being integrated. If care is not taken, the error can explode or even tend to infinity.

Stratified Sampling

Stratified sampling is another approach to variance reduction by which the integration volume is divided into subdomains that are each evaluated separately. The estimates from each subdomain are then combined with a weight depending on their subdomain integration volume. Stratified sampling has the benefit of always reducing the sample variance, without any prior knowledge of the function’s shape. Optimized stratified sampling routines may recursively divide the function into regions with comparable values by grouping specific peaks or values. Just as well, the naive approach of parsing the integration volume into uniformly large works without any prior knowledge of the function’s structure.

The stratified sampling estimate of an integral over a function g(x) is given as:

Here, the original sample volume is divided into k subdomains, where each jth subdomain has a V_j volume and n_j samples taken. The standard deviation expression is modified to:

Where σ_j is the standard deviation of samples within each jth subdomain.

A simple example of stratified sampling can be applied to the figure below, where we’ll divide the integration range into five regions of equal size (i.e. x=[0,4], x=[4,8], x=[8,12], x=[12,16], x=[16,20]).

The following code compares normal Monte Carlo integration with Stratified Sampling.

Normal Monte Carlo IntegrationEstimate for 0.0 -> 20.0 is 2.697, STD = 1.6518, (16 iterations)Estimate for 0.0 -> 20.0 is 4.057, STD = 0.8851, (64 iterations)Estimate for 0.0 -> 20.0 is 3.715, STD = 0.4479, (256 iterations)Estimate for 0.0 -> 20.0 is 3.591, STD = 0.2133, (1024 iterations)Estimate for 0.0 -> 20.0 is 3.388, STD = 0.1053, (4096 iterations)Estimate for 0.0 -> 20.0 is 3.579, STD = 0.0538, (16384 iterations)Stratified Sampling Monte Carlo IntegrationEstimate for 0.0 -> 20.0 is 1.125, STD = 1.8661, (16 iterations)Estimate for 0.0 -> 20.0 is 2.113, STD = 0.7077, (64 iterations)Estimate for 0.0 -> 20.0 is 3.860, STD = 0.1922, (256 iterations)Estimate for 0.0 -> 20.0 is 3.439, STD = 0.0464, (1024 iterations)Estimate for 0.0 -> 20.0 is 3.689, STD = 0.0116, (4096 iterations)Estimate for 0.0 -> 20.0 is 3.612, STD = 0.0029, (16384 iterations)

If we plot the error of both approaches, it’s clear that stratified sampling provides an advantage.

Thanks for reading — any suggestions or corrections are welcome! Please reach out to me at mediumCameron@gmail.com

References

Lisovskaja, Vera. Mathematical Statistics, Chalmers — TMS150, Stochastic Data Processing and Simulation, www.math.chalmers.se/Stat/Grundutb/CTH/tms150/1112/.
Jarosz, Wojciech. “Efficient Monte Carlo Methods for Light Transport in Scattering Media.” UC San Diego, 2008, cs.dartmouth.edu/~wjarosz/publications/dissertation/.
“Monte Carlo Method.” Wikipedia, Wikipedia, https://en.wikipedia.org/wiki/Monte_Carlo_method