Gin (Gaussian process Inference)

Gin is a collection of pure R tools for generating and manipulating Gaussian Process models (GPs). It is also a nice spirit

Outline

The current version includes the top-level functions:

gp_simulate - generate N psuedo-random GP realisations.
gp_conditional - compute the mean and covariance of a GP given data points.
gp_fit - fit for (hyper-)parameters of the autocovariance function (ACV)
plot_ts - plot a time series data.frame.
plot_snake - plot a 'snake' representing the mean and std.dev of a GP.
ts_load - load a plain text data file into an array ready for analysis.

And some lower-level functions that do the heavy-lifting

gp_logLikelihood - compute Gaussian log likelihood given data and model.
gp_logPosterior - compute log posterior, given log likelihood and prior.
gp_lagMatrix - compute matrix of lags tau_ij = |t_i - t_j|

Installation

Gin is an R package, but is still in development. To set up from GitHub first install (if you haven't already) Hadley Wickham's devtools package.

   install.packages("devtools")

Now you can install gin straight from GitHub:

   devtools::install_github("svdataman/gin")

Now you're good to go.

Simulation

First we must define an ACV function. This is the exponential ACV for a Damped Randon Walk (DRW) process:

   # define an ACV function
   acv <- function(theta, tau) {
     A <- abs(theta[1])
     l <- abs(theta[2])
     acov <- A * exp(-(tau / l))
     return(acov)
   }

This accepts a vector of parameters (theta), and a matrix of vector of lags (tau) and returns the ACV value at each lag.

Now we define some initial parameters

   # define parameters of ACV
   # theta[1] = mu (mean)
   # theta[2] = nu (error scale factor) 
   # theta[3:p] = parameters of ACV
   theta <- c(0.0, 1.0, 1.0, 1.0)

Next, we define a list of times at which to compute the simulated data

   # define vector of times for reconstruction
   m <- 1000
   t <- seq(-0.5, 10.5, length = m)

Using the ACV model and parameters (theta) as above, and the grid of times (t) we can generate a random realisation of the GP

  # produce Gaussian vector (simulation)
  y <- gin::gp_sim(theta, acv.model = acv, t.star = t)
  y <- as.vector(y)

  # plot the 'true' curve
  dat <- data.frame(t = t, y = y)
  gin::plot_ts(dat, type = "l")

Fitting data

The package comes with a dataset called drw. This shows 40 observations of a random process. The drw data.frame has 3 columns: t (time), y (value), dy (error). Let's plot the data

  gin::plot_ts(drw, col = "black", cex.lab=1.4)

Using the ACV model above, and the values for theta as our starting guess, we fit:

   # fit the model to the data, find Max.Like parameter values
   result <- gin::gp_fit(theta, acv.model = acv, dat = drw)

This should find the maximum likelihood solution. The ACV parameters (for of them in this case) are in result$par. We can use this to reconstruct the GP as follows.

We can use the gp_conditional function to compute the mean and covariance of the GP, at the simulation time t (above) with the specified ACV model, conditional on the given data.

   # reconstruct process: compute 'conditional' mean and covariance
   gp <- gin::gp_conditional(result$par, acv.model = acv, dat = drw, t.star = t)

and now we can overlay the conditional model

   # plot a 'snake' showing mean +/- std.dev
   gin::plot_snake(gp, add = TRUE, col.line = 3)

We can also add psuedo-random realisations of this process (conditional on the data)

   y.sim <- gin::gp_sim(result$par, dat = drw, acv.model = acv, t.star = t, 
                   N.sim = 5, plot = FALSE)
  for (i in 1:5) lines(t, y.sim[, i], col = i)

Bayesian inference

Rather than Maximum Likelihood, we could specify priors on the (hyper-)parameters of the ACV, and peform Bayesian inference.

   # define the 'priors' for the parameter values
   logPrior <- function(theta) {
     mu.d <- dnorm(theta[1], sd = 5, log = TRUE)
     nu.d <- dlnorm(theta[2], meanlog = 0, sdlog = 1, log = TRUE)
     A.d <- dlnorm(theta[3], meanlog = 0, sdlog = 1, log = TRUE)
     l.d <- dlnorm(theta[4], meanlog = 0, sdlog = 1, log = TRUE)
     return(mu.d + nu.d + A.d + l.d)
   }

In general, to do this we need an MCMC tool to sample from the posterior. Here I use the GW sampler from tonic.

   # Use gw.mcmc to generate parameter samples
   chain <- tonic::gw_sampler(gp_logPosterior, theta.0 = theta,
                              acv.model = acv, logPrior = logPrior,
                              dat = drw, burn.in = 1e4,
                              nsteps = 40e4,
                              chatter = 1, thin = 10)

This takes a minute or two to run. It should produce 2,000 samples after `thinning' by a factor 10 (it generates 20,000 samples but keeps only 1 in 10). First, we inspect the traces and autocorrelation of the chains.

   # plot MCMC diagnostics
   tonic::chain_diagnosis(chain)

Now we can visualise the posterior by plotting the 1 and 2 parameter marginal distributions.

   # name the parameters
   colnames(chain$theta) <- c("mu", "nu", "A", "l")

   # plot scatter diagrams
   tonic::contour_matrix(chain$theta, prob.levels = c(1,2,3), sigma = TRUE,
                         prob1d = 0)

The contours represent 1-, 2- and 3-sigma levels (in the sense of 68.3%, 95.4% and 99.7% of the mass).

To do

Inclide a range of ready-to-use ACV functions
Compute ACFs based on FT of specified PS
Allow for time binning (not point samples in time)
Allow for lognormal process
use matrix rather than data.frame format for data (speed)

References

For more on GPs, the best reference is:

C. E. Rasmussen & K. I. Williams, Gaussian Processes for Machine Learning, online here

Excellent online resources inclide

A Visual Exploration of Gaussian Processes

Intuition behind Gaussian Processes

Robust Gaussian Process Modeling

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
R		R
data		data
figs		figs
inst/extdata		inst/extdata
man		man
scripts		scripts
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
Rplots.pdf		Rplots.pdf
gin.Rproj		gin.Rproj
testdata.txt		testdata.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gin (Gaussian process Inference)

Outline

Installation

Simulation

Fitting data

Bayesian inference

To do

References

About

Releases

Packages

Languages

License

SimonVaughanDataAndCode/gin

Folders and files

Latest commit

History

Repository files navigation

Gin (Gaussian process Inference)

Outline

Installation

Simulation

Fitting data

Bayesian inference

To do

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages