Generalized Linear Latent Variable Models

Fits generalized linear latent variable model for multivariate data. The model can be fitted using Laplace approximation method or variational approximation method.

gllvm(
  y = NULL,
  X = NULL,
  TR = NULL,
  data = NULL,
  formula = NULL,
  family,
  num.lv = NULL,
  num.lv.c = 0,
  num.RR = 0,
  lv.formula = NULL,
  lvCor = NULL,
  studyDesign = NULL,
  dist = list(matrix(0)),
  distLV = matrix(0),
  colMat = NULL,
  colMat.rho.struct = "single",
  corWithin = FALSE,
  corWithinLV = FALSE,
  quadratic = FALSE,
  row.eff = FALSE,
  sd.errors = TRUE,
  offset = NULL,
  method = "VA",
  randomB = FALSE,
  randomX = NULL,
  beta0com = FALSE,
  zeta.struc = "species",
  plot = FALSE,
  link = "probit",
  Ntrials = 1,
  Power = 1.1,
  seed = NULL,
  scale.X = TRUE,
  return.terms = TRUE,
  gradient.check = FALSE,
  disp.formula = NULL,
  control = list(reltol = 1e-08, reltol.c = 1e-08, TMB = TRUE, optimizer = ifelse((num.RR
    + num.lv.c) == 0 | randomB != FALSE, "optim", "alabama"), max.iter = 6000, maxit =
    6000, trace = FALSE, optim.method = NULL, nn.colMat = 10),
  control.va = list(Lambda.struc = "unstructured", Ab.struct = ifelse(is.null(colMat),
    "blockdiagonal", "MNunstructured"), Ab.struct.rank = 1, Ar.struc = "diagonal",
    diag.iter = 1, Ab.diag.iter = 0, Lambda.start = c(0.3, 0.3, 0.3), NN = 3),
  control.start = list(starting.val = "res", n.init = 1, n.init.max = 10, jitter.var = 0,
    jitter.var.br = 0, start.fit = NULL, start.lvs = NULL, randomX.start = "res",
    quad.start = 0.01, start.struc = "LV", scalmax = 10, MaternKappa = 1.5, rangeP =
    NULL),
  setMap = NULL,
  ...
)

Arguments

y

(n x m) matrix of responses.

X

matrix or data.frame of environmental covariates.

TR

matrix or data.frame of trait covariates.

data

data in long format, that is, matrix of responses, environmental and trait covariates and row index named as "id". When used, model needs to be defined using formula. This is alternative data input for y, X and TR.

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted (for fixed-effects predictors).

family

distribution function for responses. Options are "negative.binomial" (with log link), poisson(link = "log"), binomial(link = "probit") (and also with link = "logit" when method = "LA" or method = "EVA"), zero-inflated poisson ("ZIP"), zero-inflated negative-binomial ("ZINB"), gaussian(link = "identity"), Tweedie ("tweedie") (with log link, for "LA" and "EVA"-method), "gamma" (with log link), "exponential" (with log link), beta ("beta") (with logit and probit link, for "LA" and "EVA"-method), "ordinal" (with "VA" and "EVA"-method), beta hurdle "betaH" (for "VA" and "EVA"-method) and "orderedBeta" (for "VA" and "EVA"-method). Note: "betaH" and "orderedBeta" with "VA"-method are actually fitted using a hybrid approach such that EVA is applied to the beta distribution part of the likelihood.

num.lv

number of latent variables, d, in gllvm model. Non-negative integer, less than number of response variables (m). Defaults to 2, if num.lv.c=0 and num.RR=0, otherwise 0.

num.lv.c

number of latent variables, d, in gllvm model to inform, i.e., with residual term. Non-negative integer, less than number of response (m) and equal to, or less than, the number of predictor variables (k). Defaults to 0. Requires specification of "lv.formula" in combination with "X" or "datayx". Can be used in combination with num.lv and fixed-effects, but not with traits.

num.RR

number of latent variables, d, in gllvm model to constrain, without residual term (reduced rank regression). Cannot yet be combined with traits.

lv.formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted (for latent variables).

lvCor

(Under development, not to be used at the moment!) correlation structure for latent variables, defaults to NULL Correlation structure for latent variables can be defined via formula, eg. ~struc(1|groups), where option to 'struc' are corAR1 (AR(1) covariance), corExp (exponentially decaying, see argument 'dist') and corCS (compound symmetry). The grouping variable needs to be included either in 'X' or 'studyDesign'. Works at the moment only with unconstrained ordination without quadratic term.

studyDesign

variables related to eg. sampling/study design, used for defining correlation structure of the latent variables and row effects.

dist

list of length equal to the number of row effects with correlation structure corExp that holds the matrix of coordinates or time points.

distLV

matrix of coordinates or time points used for LV correlation structure corExp.

colMat

either a list of length 2 with matrix of similarity for the column effects and named matrix "dist" of pairwise distances (of columns, to use in selecting nearest neighbours) for a sparse approximation of the matrix inverse in the likelihood, or only a (p.d.) matrix of similarity for the column effects for a normal inverse calculation.

colMat.rho.struct

either single (default) or term indicating whether the signal parameter should be shared for covariates, or not.

corWithin

logical. Vector of length equal to the number of row effects. For structured row effects with correlation, If TRUE, correlation is set between row effects of the observation units within group. Correlation and groups can be defined using row.eff. Defaults to FALSE, when correlation is set for row parameters between groups.

corWithinLV

logical. For LVs with correlation, If TRUE, correlation is set between rows of the observation units within group. Defaults to FALSE, when correlation is set for rows between groups.

quadratic

either FALSE(default), TRUE, or LV. If FALSE models species responses as a linear function of the latent variables. If TRUE models species responses as a quadratic function of the latent variables. If LV assumes species all have the same quadratic coefficient per latent variable.

row.eff

FALSE, fixed, "random" or formula to define the structure for the row parameters. Indicating whether row effects are included in the model as a fixed or as a random effects. Defaults to FALSE when row effects are not included. Structured random row effects can be defined via formula, eg. ~(1|groups), when unique row effects are set for each group, not for all rows, the grouping variable needs to be included in X. Correlation structure between random group effects/intercepts can also be set using ~struc(1|groups), where option to 'struc' are corAR1 (AR(1) covariance), corExp (exponentially decaying, see argument 'dist') and corCS (compound symmetry). Correlation structure can be set between or within groups, see argument 'corWithin'.

sd.errors

logical. If TRUE (default) standard errors for parameter estimates are calculated.

offset

vector or matrix of offset terms.

method

model can be fitted using Laplace approximation method (method = "LA") or variational approximation method (method = "VA"), or with extended variational approximation method (method = "EVA") when VA is not applicable. If particular model has not been implemented using the selected method, model is fitted using the alternative method as a default. Defaults to "VA".

randomB

either FALSE(default), "LV", "P", "single", or "iid". Fits concurrent or constrained ordination (i.e. models with num.lv.c or num.RR) with random slopes for the predictors. "LV" assumes LV-specific variance parameters, "P" predictor specific, and "single" the same across LVs and predictors.

randomX

formula for species specific random effects of environmental variables in fourth corner model. Defaults to NULL, when random slopes are not included.

beta0com

logical. If FALSE column-specific intercepts are assumed. If TRUE, column-specific intercepts are collected to a common value.

zeta.struc

structure for cut-offs in the ordinal model. Either "common", for the same cut-offs for all species, or "species" for species-specific cut-offs. For the latter, classes are arbitrary per species, each category per species needs to have at least one observations. Defaults to "species".

plot

logical. If TRUE ordination plots will be printed in each iteration step when TMB = FALSE. Defaults to FALSE.

link

link function for binomial family if method = "LA" and beta family. Options are "logit" and "probit".

Ntrials

number of trials for binomial family.

Power

fixed power parameter in Tweedie model. Scalar from interval (1,2). Defaults to 1.1. If set to NULL it is estimated (note: experimental).

seed

a single seed value, defaults to NULL.

scale.X

logical. If TRUE, covariates are scaled when fourth corner model is fitted.

return.terms

logical. If TRUE 'terms' object is returned.

gradient.check

logical. If TRUE gradients are checked for large values (>0.01) even if the optimization algorithm did converge.

disp.formula

formula, or alternatively a vector of indices, for the grouping of dispersion parameters (e.g. in a negative-binomial distribution). Defaults to NULL so that all species have their own dispersion parameter. Is only allowed to include categorical variables. If a formula, data should be included as named rows in y.

control

A list with the following arguments controlling the optimization:

reltol:: convergence criteria for log-likelihood, defaults to 1e-8.
reltol.c:: convergence criteria for equality constraints in ordination with predictors, defaults to 1e-8.
TMB:: logical, if TRUE model will be fitted using Template Model Builder (TMB). TMB is always used if method = "LA". Defaults to TRUE.
optimizer:: if TMB=TRUE, log-likelihood can be optimized using "optim" (default) or "nlminb". For ordination with predictors (num.RR>0 or num.lv.c>0) this can additionally be one of alabama(default), nloptr(agl) or nloptr(sqp).
max.iter:: maximum number of iterations when TMB = FALSE or for optimizer = "nlminb" when TMB = TRUE, defaults to 6000.
maxit:: maximum number of iterations for optimizer, defaults to 6000.
trace:: logical, if TRUE in each iteration step information on current step will be printed. Defaults to FALSE. Only with TMB = FALSE.
optim.method:: optimization method to be used if optimizer is "optim","alabama", or "nloptr", but the latter two are only available in combination with at least two latent variables (i.e., num.RR+num.lv.c>1). Defaults to "BFGS", but to "L-BFGS-B" for Tweedie family due the limited-memory use. For optimizer='alabama' this can be any "optim" method, or "nlminb". If optimizer = 'nloptr(agl)' this can be one of: "NLOPT_LD_CCSAQ", "NLOPT_LD_SLSQP", "NLOPT_LD_TNEWTON_PRECOND" (default), "NLOPT_LD_TNEWTON", "NLOPT_LD_MMA".
nn.colMat:: number of nearest neighbours for calculating inverse of "colMat", defaults to 10. If set to the number of columns in the response data, a standard inverse is used instead.

control.va

A list with the following arguments controlling the variational approximation method:

Lambda.struc:: covariance structure of VA distributions for latent variables when method = "VA", "unstructured" or "diagonal".
Ab.struct:: covariance structure of VA distributions for random slopes when method = "VA", ordered in terms of complexity: "diagonal", "MNdiagonal" (only with colMat), "blockdiagonal" (default without colMat), "MNunstructured" (default, only with colMat), "diagonalCL1" ,"CL1" (only with colMat), "CL2" (only with colMat),"diagonalCL2" (only with colMat), or "unstructured" (only with colMat).
Ab.struct.rank:: number of columns for the cholesky of the variational covariance matrix to use, defaults to 1. Only applicable with "MNunstructured", "diagonalCL1", "CL1","diagonalCL2", and "unstructured".
Ar.struc:: covariance structure of VA distributions for random row effects when method = "VA", "unstructured" or "diagonal". Defaults to "diagonal".
diag.iter:: non-negative integer which can sometimes be used to speed up the updating of variational (covariance) parameters in VA method. Can sometimes improve the accuracy. If TMB = TRUE either 0 or 1. Defaults to 1.
Ab.diag.iter:: As above, but for variational covariance of random slopes.
Lambda.start:: starting values for variances in VA distributions for latent variables, random row effects and random slopes in variational approximation method. Defaults to 0.3.
NN:: Number of nearest neighbors for NN variational covariance. Defaults to ...

control.start

A list with the following arguments controlling the starting values:

starting.val:: starting values can be generated by fitting model without latent variables, and applying factorial analysis to residuals to get starting values for latent variables and their coefficients (starting.val = "res"). Another options are to use zeros as a starting values (starting.val = "zero") or initialize starting values for latent variables with (n x num.lv) matrix. Defaults to "res", which is recommended.
n.init:: number of initial runs. Uses multiple runs and picks up the one giving highest log-likelihood value. Defaults to 1.
n.init.max:: maximum number of refits try try for n.init without improvement, defaults to 10.
start.fit:: object of class 'gllvm' which can be given as starting parameters for count data (poisson, NB, or ZIP).
start.lvs:: initialize starting values for latent variables with (n x num.lv) matrix. Defaults to NULL.
jitter.var:: jitter variance for starting values of latent variables. Defaults to 0, meaning no jittering.
jitter.var.br:: jitter variance for starting values of random slopes. Defaults to 0, meaning no jittering.
randomX.start:: starting value method for the random slopes. Options are "zero" and "res". Defaults to "res".
start.struc:: starting value method for the quadratic term. Options are "LV" (default) and "all".
quad.start:: starting values for quadratic coefficients. Defaults to 0.01.
MaternKappa:: Starting value for smoothness parameter kappa of Matern covariance function. Defaults to 3/2.
scalmax:: Sets starting value for the scale parameter for the coordinates. Defaults to 10, when the starting value for scale parameter scales the distances of coordinates between 0-10.
rangeP:: Sets starting value for the range parameter for the correlation structure.

setMap

UNDER DEVELOPMENT, DO NOT USE! list of a set of parameters to be fixed. Parameters to be fixed need to be defined with factors. Other arguments may overwrite these definitions.

...

Not used.

Value

An object of class "gllvm" includes the following components:

call

function call

y

(n x m) matrix of responses.

X

matrix or data.frame of environmental covariates.

X.design

design matrix of environmental covariates.

lv.X

design matrix or data.frame of environmental covariates for latent variables.

lv.X.design

design matrix or data.frame of environmental covariates for latent variables.

TR

Trait matrix

formula

Formula for predictors

lv.formula

Formula of latent variables in constrained and concurrent ordination

randomX

Formula for species specific random effects in fourth corner model

randomB

Boolean flag for random slopes in constrained and concurrent ordination

num.lv

Number of unconstrained latent variables

num.lv.c

Number of latent variables in concurrent ordination

num.RR

Number of latent variables in constrained ordination

Ntrials

Number of trials in a binomial model

method

Method used for integration

family

Response distribution

row.eff

Type of row effect used

n.init

Number of model runs for best fit

disp.group

Groups for dispersion parameters

sd

List of standard errors

lvs

Latent variables

params

List of parameters

theta: latent variables' loadings relative to the diagonal entries of loading matrix
sigma.lv: diagonal entries of latent variables' loading matrix
LvXcoef: Predictor coefficients (or predictions for random slopes) related to latent variables, i.e. canonical coefficients
beta0: column specific intercepts
Xcoef: coefficients related to environmental covariates X
B: coefficients in fourth corner model, and RE means
Br: column random effects
sigmaB: scale parameters for column-specific random effects
rho.sp: (positive) correlation parameter for influence strength of "colMat"
row.params: row-specific intercepts
sigma: scale parameters for row-specific random effects
phi: dispersion parameters $\phi$ for negative binomial or Tweedie family, probability of zero inflation for ZIP family, standard deviation for gaussian family or shape parameter for gamma/beta family
inv.phi: dispersion parameters $1/\phi$ for negative binomial

Power

power parameter $\nu$ for Tweedie family

sd

list of standard errors of parameters

prediction.errors

list of prediction covariances for latent variables and variances for random row effects when method "LA" is used

A, Ar, Ab_lv, spArs

covariance matrices for variational densities of latent variables, random row effects, random slopes, and colum effects respectively

seed

Seed used for calculating starting values

TMBfn

TMB objective and derivative functions

logL

log likelihood

convergence

convergence code of optimizer

quadratic

flag for quadratic model

Hess

List holding matrices of second derivatives

beta0com

Flag for common intercept

cstruc

Correlation structure for row effects

cstruclv

Correlation structure for LVs

dist

Matrix of coordinates or time points used for row effects

distLV

Matrix of coordinates or time points used for LVs

col.eff

list of components for column random effects

Ab.struct: variational covariance structure of fitted model
Ab.struct.rank: fitted rank of variational covariance matrix
col.eff: flag indicating if column random effects are included
spdr: design matrix
colMat.rho.struct: character vector for signal parameter

terms: Terms object for main predictors
start: starting values for model
optim.method: Optimization method when using 'optim', 'alabama', or 'nloptr'

Details

Fits generalized linear latent variable models as in Hui et al. (2015 and 2017) and Niku et al. (2017). Method can be used with two types of latent variable models depending on covariates. If only site related environmental covariates are used, the expectation of response $Y_{ij}$ is determined by

$$g(\mu_{ij}) = \eta_{ij} = \alpha_i + \beta_{0j} + x_i'\beta_j + u_i'\theta_j,$$

where $g(.)$ is a known link function, $u_i$ are $d$-variate latent variables ($d$<<$m$), $\alpha_i$ is an optional row effect at site $i$, and it can be fixed or random effect (also other structures are possible, see below), $\beta_{0j}$ is an intercept term for species $j$, $\beta_j$ and $\theta_j$ are column specific coefficients related to covariates and the latent variables, respectively.

Quadratic model

Alternatively, a more complex version of the model can be fitted with quadratic = TRUE, where species are modeled as a quadratic function of the latent variables: $$g(\mu_{ij}) = \eta_{ij} = \alpha_i + \beta_{0j} + x_i'\beta_j + u_i'\theta_j - u_i' D_j u_i$$. Here, D_j is a diagonal matrix of positive only quadratic coefficients, so that the model generates concave shapes only. This implementation follows the ecological theoretical model where species are generally recognized to exhibit non-linear response curves. For a model with quadratic responses, quadratic coefficients are assumed to be the same for all species: $$D_j = D$$. This model requires less information per species and can be expected to be more applicable to most datasets. The quadratic coefficients D can be used to calculate the length of ecological gradients. For quadratic responses, it can be useful to provide the latent variables estimated with a GLLVM with linear responses, or estimated with (Detrended) Correspondence Analysis. The latent variables can then be passed to the start.lvs argument inside the control.start list, which in many cases gives good results.

Ordination with predictors

For GLLVMs with both linear and quadratic response model, a series of predictors $x_{lv}$ can be included to explain the latent variables:

$$g(\mu_{ij}) = \alpha_i + \beta_{0j} + x_i'\beta_j + (B' x_{lv,i} + \epsilon_i)' \gamma_j - (B' x_{lv,i} + \epsilon_i)' D_j (B' x_{lv,i} + \epsilon_i) ,$$ where $z_i = B' x_{lv,i} + \epsilon_i$ are latent variables informed by the predictors, but not constrained compared to unconstrained ordination as in methods such as CCA or RDA. Omitting the predictors results in an unconstrained ordination, and omitting $\epsilon_i$ in the usual constrained ordination, which can also be fitted.

Fourth corner model

An alternative model is the fourth corner model (Brown et al., 2014, Warton et al., 2015) which will be fitted if also trait covariates are included. The expectation of response $Y_{ij}$ is

$$g(\mu_{ij}) = \alpha_i + \beta_{0j} + x_i'(\beta_x + b_j) + TR_j'\beta_t + vec(B)*kronecker(TR_j,X_i) + u_i'\theta_j - u_i'D_ju_i$$

where g(.), $u_i$, $\beta_{0j}$ and $\theta_j$ are defined as above. Vectors $\beta_x$ and $\beta_t$ are the main effects or coefficients related to environmental and trait covariates, respectively, matrix $B$ includes interaction terms. Vectors $b_j$ are optional species-specific random slopes for environmental covariates. The interaction/fourth corner terms are optional as well as are the main effects of trait covariates.

Structured row effects

In addition to the site-specific random effects, $\alpha_i$, it is also possible to set arbitrary structure/design for the row effects. That is, assume that observations / rows $i=1,...,n$ in the data matrix are from groups $t=1,...,T$, so that each row $i$ belongs to one of the groups, denote $G(i) \in \{1,...,T\}$. Each group $t$ has a number of observations $n_t$, so that $\sum_{t=1}^{T} n_t =n$. Now we can set random intercept for each group $t$, (see argument 'row.eff'):

$$g(\mu_{ij}) = \eta_{ij} = \alpha_{G(i)} + \beta_{0j} + x_i'\beta_j + u_i'\theta_j,$$

There is also a possibility to set correlation structure for the random intercepts between groups, so that $(\alpha_{1},...,\alpha_{T})^\top \sim N(0, \Sigma_r)$. That might be the case, for example, when the groups are spatially or temporally dependent. Another option is to set row specific random intercepts $\alpha_i$, but to set the correlation structure for the observations within groups, (see argument 'corWithin'). That is, we can set $corr(\alpha_{i},\alpha_{i'}) = C(i,i') \neq 0$ according to some correlation function $C$, when $G(i)=G(i')$. This model is restricted to the case, where each group has equal number of observations (rows), that is $n_t=n_{t'}$ for all $t,t' \in \{1,...,T\}$.

The correlation structures available in the package are

corAR1: autoregressive process of order 1.
corExp: exponentially decaying, see argument 'dist'.
corCS: compound symmetry.

Starting values

The method is sensitive for the choices of initial values of the latent variables. Therefore it is recommendable to use multiple runs and pick up the one giving the highest log-likelihood value (see argument 'n.init'). However, sometimes this is computationally too demanding, and default option starting.val = "res" is recommended. For more details on different starting value methods, see Niku et al., (2018).

Models are implemented using TMB (Kristensen et al., 2015) applied to variational approximation (Hui et al., 2017), extended variational approximation (Korhonen et al., 2021) and Laplace approximation (Niku et al., 2017).

With ordinal family response classes must start from 0 or 1.

Distributions

Mean and variance for distributions are defined as follows.

For count data family = poisson():: Expectation $E[Y_{ij}] = \mu_{ij}$, variance $V(\mu_{ij}) = \mu_{ij}$, or
family = "negative.binomial":: Expectation $E[Y_{ij}] = \mu_{ij}$, variance $V(\mu_{ij}) = \mu_{ij}+\mu_{ij}^2\phi_j$, or
family = "ZIP":: Expectation $E[Y_{ij}] = (1-p)\mu_{ij}$, variance $V(\mu_{ij}) = \mu_{ij}(1-p_j)(1+\mu_{ij}p)$.
family = "ZINB":: Expectation $E[Y_{ij}] = (1-p)\mu_{ij}$, variance $V(\mu_{ij}) = \mu_{ij}(1-p_j)(1+\mu_{ij}(\phi_j+p_j))$.
For binary data family = binomial():: Expectation $E[Y_{ij}] = \mu_{ij}$, variance $V(\mu_{ij}) = \mu_{ij}(1-\mu_{ij})$.
For percent cover data $0 < Y_{ij} < 1$ family = "beta":: Expectation $E[Y_{ij}] = \mu_{ij}$, variance $V(\mu_{ij}) = \mu_{ij}(1-\mu_{ij})//(1+\phi_j)$.
For positive continuous data family = "gamma":: Expectation $E[Y_{ij}] = \mu_{ij}$, variance $V(\mu_{ij}) = \mu_{ij}^2/\phi_j$, where $\phi_j$ is species specific shape parameter.
For non-negative continuous data family = "exponential":: Expectation $E[Y_{ij}] = \mu_{ij}$, variance $V(\mu_{ij}) = \mu_{ij}^2$.
For non-negative continuous or biomass data family = "tweedie": Expectation $E[Y_{ij}] = \mu_{ij}$, variance $V(\mu_{ij}) = \phi_j*\mu_{ij}^\nu$, where $\nu$ is a power parameter of Tweedie distribution. See details Dunn and Smyth (2005).
For ordinal data family = "ordinal":: Cumulative probit model, see Hui et.al. (2016).
For normal distributed data family = gaussian():: Expectation $E[Y_{ij}] = \mu_{ij}$, variance $V(y_{ij}) = \phi_j^2.$

Note

If function gives warning: 'In f(x, order = 0) : value out of range in 'lgamma”, optimizer have visited an area where gradients become too big. It is automatically fixed by trying another step in the optimization process, and can be ignored if errors do not occur.

References

Brown, A. M., Warton, D. I., Andrew, N. R., Binns, M., Cassis, G., and Gibb, H. (2014). The fourth-corner solution - using predictive models to understand how species traits interact with the environment. Methods in Ecology and Evolution, 5:344-352.

Dunn, P. K. and Smyth, G. K. (2005). Series evaluation of tweedie exponential dispersion model densities. Statistics and Computing, 15:267-280.

Hui, F. K. C., Taskinen, S., Pledger, S., Foster, S. D., and Warton, D. I. (2015). Model-based approaches to unconstrained ordination. Methods in Ecology and Evolution, 6:399-411.

Hui, F. K. C., Warton, D., Ormerod, J., Haapaniemi, V., and Taskinen, S. (2017). Variational approximations for generalized linear latent variable models. Journal of Computational and Graphical Statistics. Journal of Computational and Graphical Statistics, 26:35-43.

Kasper Kristensen, Anders Nielsen, Casper W. Berg, Hans Skaug, Bradley M. Bell (2016). TMB: Automatic Differentiation and Laplace Approximation. Journal of Statistical Software, 70(5), 1-21.

Korhonen, P., Hui, F. K. C., Niku, J., and Taskinen, S. (2021). Fast, universal estimation of latent variable models using extended variational approximations. Stat Comput 33, 26 (2023).

Niku, J., Warton, D. I., Hui, F. K. C., and Taskinen, S. (2017). Generalized linear latent variable models for multivariate count and biomass data in ecology. Journal of Agricultural, Biological, and Environmental Statistics, 22:498-522.

Niku, J., Brooks, W., Herliansyah, R., Hui, F. K. C., Taskinen, S., and Warton, D. I. (2018). Efficient estimation of generalized linear latent variable models. PLoS One, 14(5):1-20.

Warton, D. I., Guillaume Blanchet, F., O'Hara, R. B., Ovaskainen, O., Taskinen, S., Walker, S. C. and Hui, F. K. C. (2015). So many variables: Joint modeling in community ecology. Trends in Ecology & Evolution, 30:766-779.

Author

Jenni Niku <jenni.m.e.niku@jyu.fi>, Wesley Brooks, Riki Herliansyah, Francis K.C. Hui, Pekka Korhonen, Sara Taskinen, Bert van der Veen, David I. Warton

Examples

# Extract subset of the microbial data to be used as an example
data(microbialdata)
X <- microbialdata$Xenv
y <- microbialdata$Y[, order(colMeans(microbialdata$Y > 0), 
                     decreasing = TRUE)[21:40]]
fit <- gllvm(y, X, formula = ~ pH + Phosp, family = poisson())
fit$logL
#> [1] -4018.808
ordiplot(fit)

coefplot(fit)


# \donttest{
# Inclusion of structured random row effect
sDesign<-data.frame(Site = microbialdata$Xenv$Site)
fit <- gllvm(y, X, formula = ~ pH + Phosp, family = poisson(), 
            studyDesign=sDesign, row.eff=~(1|Site))

## Load a dataset from the mvabund package
library(mvabund)
#> 
#> Attaching package: 'mvabund'
#> The following object is masked from 'package:gllvm':
#> 
#>     coefplot
data(antTraits, package = "mvabund")
y <- as.matrix(antTraits$abund)
X <- as.matrix(antTraits$env)
TR <- antTraits$traits
# Fit model with environmental covariates Bare.ground and Shrub.cover
fit <- gllvm(y, X, formula = ~ Bare.ground + Shrub.cover,
            family = poisson())
ordiplot(fit)
coefplot.gllvm(fit)



## Example 1: Fit model with two unconstrained latent variables
# Using variational approximation:
fitv0 <- gllvm(y, family = "negative.binomial", method = "VA")
ordiplot(fitv0)
plot(fitv0, mfrow = c(2,2))


summary(fitv0)
#> 
#> Call:
#> gllvm(y = y, family = "negative.binomial", method = "VA")
#> 
#> Family:  negative.binomial 
#> 
#> AIC:  4056.17 AICc:  4106.324 BIC:  4889.878 LL:  -1865.1 df:  163 
#> 
#> Informed LVs:  0 
#> Constrained LVs:  0 
#> Unconstrained LVs:  2 
#> 
#> Formula:  ~ 1 
#> LV formula:  ~ 0 
confint(fitv0)
#>                                                2.5 %      97.5 %
#> sigma.lv.LV1                            -0.101054520  1.85257413
#> sigma.lv.LV2                             1.079578932  4.48472729
#> theta.LV1.1                              1.000000000  1.00000000
#> theta.LV1.2                             -3.369836924  3.61832887
#> theta.LV1.3                             -1.020676845  2.93833281
#> theta.LV1.4                             -0.677509762  1.54846484
#> theta.LV1.5                             -0.460199165  2.15216832
#> theta.LV1.6                             -2.595239166  1.44035981
#> theta.LV1.7                             -0.921964395  2.06652878
#> theta.LV1.8                             -6.308865233  1.21172423
#> theta.LV1.9                             -1.587002683  5.80962347
#> theta.LV1.10                            -0.558480393  0.96006157
#> theta.LV1.11                            -0.994894799  1.37917755
#> theta.LV1.12                            -0.699284590  2.72152491
#> theta.LV1.13                            -1.196190724  0.15991813
#> theta.LV1.14                            -1.105138881  4.00864981
#> theta.LV1.15                            -1.108958700  0.73146299
#> theta.LV1.16                            -0.653206772  1.48667904
#> theta.LV1.17                            -1.781840019  0.71364295
#> theta.LV1.18                            -0.928084351  1.19484660
#> theta.LV1.19                            -1.532606979  0.26859879
#> theta.LV1.20                            -2.287651301  0.40331968
#> theta.LV1.21                            -3.196743471  1.07149186
#> theta.LV1.22                            -0.365873699  1.80424897
#> theta.LV1.23                            -3.327191292  1.19747175
#> theta.LV1.24                            -2.421145103  0.63384256
#> theta.LV1.25                            -1.105168385  1.40435943
#> theta.LV1.26                            -1.357941543  4.40671446
#> theta.LV1.27                            -2.050006131  0.21916166
#> theta.LV1.28                            -2.560698153  0.76018611
#> theta.LV1.29                            -3.324423376  1.97129068
#> theta.LV1.30                            -4.788160345  1.56070785
#> theta.LV1.31                            -1.012929893  0.15267236
#> theta.LV1.32                            -1.691735472  1.09594412
#> theta.LV1.33                            -0.210983552  1.09985766
#> theta.LV1.34                            -1.667026576  1.01258435
#> theta.LV1.35                            -0.596124303  1.33635731
#> theta.LV1.36                            -0.985066088  0.19064344
#> theta.LV1.37                            -2.768954659  0.48519243
#> theta.LV1.38                            -1.876374696  0.67387483
#> theta.LV1.39                            -0.521953234  2.41005968
#> theta.LV1.40                            -0.374550987  1.19293425
#> theta.LV1.41                            -2.474005601  0.60926507
#> theta.LV2.1                              0.000000000  0.00000000
#> theta.LV2.2                              1.000000000  1.00000000
#> theta.LV2.3                             -0.245410669  0.60976361
#> theta.LV2.4                             -0.058090401  0.36365924
#> theta.LV2.5                             -0.652256739  0.04113870
#> theta.LV2.6                             -0.107726475  0.67653905
#> theta.LV2.7                             -0.151866782  0.42717997
#> theta.LV2.8                             -1.416507224  0.76089128
#> theta.LV2.9                             -0.322498960  1.41262112
#> theta.LV2.10                            -0.392962572 -0.01054992
#> theta.LV2.11                            -0.634745126 -0.10053729
#> theta.LV2.12                            -0.504044002  0.43415965
#> theta.LV2.13                            -0.161777296  0.20510906
#> theta.LV2.14                            -0.221878503  0.87848770
#> theta.LV2.15                            -0.471025812 -0.09694277
#> theta.LV2.16                            -0.556198882  0.01509922
#> theta.LV2.17                            -0.631661565  0.04542530
#> theta.LV2.18                            -0.484010869  0.17681714
#> theta.LV2.19                            -0.363823413  0.10587534
#> theta.LV2.20                            -0.512549961  0.19530946
#> theta.LV2.21                            -0.932254289  0.21005739
#> theta.LV2.22                            -0.116267576  0.37719262
#> theta.LV2.23                            -1.113872135 -0.04380816
#> theta.LV2.24                            -0.696595194  0.13935504
#> theta.LV2.25                            -0.721465153  0.18277040
#> theta.LV2.26                            -0.423107729  0.70806454
#> theta.LV2.27                            -0.372919415  0.22470612
#> theta.LV2.28                            -0.756153818  0.04814946
#> theta.LV2.29                            -1.545063896  0.10741661
#> theta.LV2.30                            -0.307876348  0.93128757
#> theta.LV2.31                            -0.211090731  0.07106664
#> theta.LV2.32                            -0.014392424  0.58052603
#> theta.LV2.33                            -0.254229514  0.10720520
#> theta.LV2.34                            -0.289241622  0.68859090
#> theta.LV2.35                            -0.263582652  0.21016971
#> theta.LV2.36                            -0.241946551  0.03954228
#> theta.LV2.37                            -0.621462849  0.17289429
#> theta.LV2.38                            -0.300903459  0.46399115
#> theta.LV2.39                            -0.299675048  0.35746054
#> Intercept.Amblyopone.australis          -0.362754556  0.04893208
#> Intercept.Aphaenogaster.longiceps       -0.709492423  0.06384867
#> Intercept.Camponotus.cinereus.amperei   -1.855330522 -0.03201067
#> Intercept.Camponotus.claripes           -6.253146607 -0.31989909
#> Intercept.Camponotus.consobrinus        -3.698764645 -0.83566248
#> Intercept.Camponotus.nigriceps          -0.036845879  1.06008055
#> Intercept.Camponotus.nigroaeneus         0.642091321  1.79310593
#> Intercept.Cardiocondyla.nuda.atalanta   -1.679590984  0.47959541
#> Intercept.Crematogaster.sp..A           -1.317854139  0.23396693
#> Intercept.Heteroponera.sp..A            -5.327668272 -0.01451641
#> Intercept.Iridomyrmex.bicknelli         -4.746412379  0.10285262
#> Intercept.Iridomyrmex.dromus             0.802271623  1.67654445
#> Intercept.Iridomyrmex.mjobergi           0.772758491  1.80434274
#> Intercept.Iridomyrmex.purpureus         -1.859361368  0.26212098
#> Intercept.Iridomyrmex.rufoniger          0.981774616  1.68335474
#> Intercept.Iridomyrmex.suchieri          -1.288003487  1.11273980
#> Intercept.Iridomyrmex.suchieroides       2.022082601  2.70289699
#> Intercept.Melophorus.sp..E              -0.135076464  1.04253411
#> Intercept.Melophorus.sp..F              -1.135392642  0.24781672
#> Intercept.Melophorus.sp..H              -2.009053745 -0.37473742
#> Intercept.Meranoplus.sp..A               0.082735499  0.98236440
#> Intercept.Monomorium.leae               -1.272862023  0.16011200
#> Intercept.Monomorium.rothsteini         -1.453993812  0.64859767
#> Intercept.Monomorium.sydneyense          0.841382811  1.84727288
#> Intercept.Myrmecia.pilosula.complex     -0.730680844  1.20941960
#> Intercept.Notoncus.capitatus            -1.320448289  0.40725609
#> Intercept.Notoncus.ectatommoides        -2.053404168 -0.20207715
#> Intercept.Nylanderia.sp..A              -2.473060704  0.53476753
#> Intercept.Ochetellus.glaber              0.239164567  1.22238755
#> Intercept.Paraparatrechina.sp..B         0.030408492  1.47001253
#> Intercept.Pheidole.sp..A                -3.864443141 -0.21642213
#> Intercept.Pheidole.sp..B                -3.311803558  0.27488106
#> Intercept.Pheidole.sp..E                 2.092502680  2.54373402
#> Intercept.Pheidole.sp..J                -2.030383570 -0.15338453
#> Intercept.Polyrhachis.sp..A              1.226637708  1.97956238
#> Intercept.Rhytidoponera.metallica.sp..A -2.978459803 -0.56718703
#> Intercept.Rhytidoponera.sp..B           -1.984440328 -0.52864828
#> Intercept.Solenopsis.sp..A               1.772740348  2.27348555
#> Intercept.Stigmacros.sp..A              -0.397038273  1.09413892
#> Intercept.Tapinoma.sp..A                -3.426967711 -0.97579204
#> Intercept.Tetramorium.sp..A             -1.091880540  0.51275727
#> inv.phi.Amblyopone.australis             0.282620501  1.23747031
#> inv.phi.Aphaenogaster.longiceps         -0.631528704  0.77866261
#> inv.phi.Camponotus.cinereus.amperei     -0.270487810  1.34890505
#> inv.phi.Camponotus.claripes             -0.211424882  1.00850776
#> inv.phi.Camponotus.consobrinus          -1.082732652  2.58821337
#> inv.phi.Camponotus.nigriceps             0.057469939  1.51923385
#> inv.phi.Camponotus.nigroaeneus           0.053228671  3.83235955
#> inv.phi.Cardiocondyla.nuda.atalanta     -0.061328003  0.51823372
#> inv.phi.Crematogaster.sp..A             -0.106602377  1.07141357
#> inv.phi.Heteroponera.sp..A              -0.070044146  0.42295717
#> inv.phi.Iridomyrmex.bicknelli           -0.079418827  0.52848291
#> inv.phi.Iridomyrmex.dromus               0.225181438  2.46845474
#> inv.phi.Iridomyrmex.mjobergi            -0.118035741  5.03073663
#> inv.phi.Iridomyrmex.purpureus           -0.066779230  0.56608215
#> inv.phi.Iridomyrmex.rufoniger           -0.169734753  4.83704229
#> inv.phi.Iridomyrmex.suchieri             0.009085673  0.42275887
#> inv.phi.Iridomyrmex.suchieroides        -0.083718394 19.14377216
#> inv.phi.Melophorus.sp..E                -0.154535948  2.35711091
#> inv.phi.Melophorus.sp..F                -0.884541981  4.74688233
#> inv.phi.Melophorus.sp..H                -0.832200307  2.32031093
#> inv.phi.Meranoplus.sp..A                -0.512532219  5.30099140
#> inv.phi.Monomorium.leae                 -0.424201849  3.55560989
#> inv.phi.Monomorium.rothsteini           -0.018459814  0.63757293
#> inv.phi.Monomorium.sydneyense            0.248471646  1.69159906
#> inv.phi.Myrmecia.pilosula.complex        0.014780948  2.74702080
#> inv.phi.Notoncus.capitatus              -0.099518667  1.27882059
#> inv.phi.Notoncus.ectatommoides          -0.365816423  1.37942419
#> inv.phi.Nylanderia.sp..A                -0.041815620  0.38604661
#> inv.phi.Ochetellus.glaber               -0.084158478  4.06880429
#> inv.phi.Paraparatrechina.sp..B           0.107368740  1.41088340
#> inv.phi.Pheidole.sp..A                  -0.188703853  1.06111908
#> inv.phi.Pheidole.sp..B                  -0.063973551  0.30179019
#> inv.phi.Pheidole.sp..E                  -1.415632292 20.55783945
#> inv.phi.Pheidole.sp..J                  -0.348520249  1.33690820
#> inv.phi.Polyrhachis.sp..A                0.320733128  2.69246122
#> inv.phi.Rhytidoponera.metallica.sp..A   -0.276747974  0.79866464
#> inv.phi.Rhytidoponera.sp..B             -7.676106111 15.06596340
#> inv.phi.Solenopsis.sp..A                -0.858343873 15.09378595
#> inv.phi.Stigmacros.sp..A                 0.003907447  1.35251755
#> inv.phi.Tapinoma.sp..A                  -1.919597041  4.17068867
#> inv.phi.Tetramorium.sp..A               -0.028744157  1.05076124
#> inv.phi.Tapinoma.sp..A                   0.106513400  2.27872438
#> inv.phi.Tetramorium.sp..A               -0.300594294  3.63546926

## Example 1a: Fit concurrent ordination model with two latent variables and with 
# quadratic response model
# We scale and centre the  predictors to improve convergence
fity1 <- gllvm(y, X = scale(X), family = "negative.binomial", 
              num.lv.c=2, method="VA")
ordiplot(fity1, biplot = TRUE)

#'## Example 1b: Fit constrained ordination model with two latent variables and with 
# random canonical coefficients
fity2 <- gllvm(y, X = scale(X), family = "negative.binomial", 
              num.RR=2, randomB="LV", method="VA")
              
# Using Laplace approximation: (this line may take about 30 sec to run)
fitl0 <- gllvm(y, family = "negative.binomial", method = "LA")
ordiplot(fitl0)

# Poisson family:
fit.p <- gllvm(y, family = poisson(), method = "LA")
ordiplot(fit.p)

# Use poisson model as a starting parameters for ZIP-model, this line 
# may take few minutes to run
fit.z <- gllvm(y, family = "ZIP", method = "LA", 
              control.start = list(start.fit = fit.p))
ordiplot(fit.z)


## Example 2: gllvm with environmental variables
# Fit model with two latent variables and all environmental covariates,
fitvX <- gllvm(formula = y ~ X, family = "negative.binomial")
ordiplot(fitvX, biplot = TRUE)
coefplot.gllvm(fitvX)


# Fit model with environmental covariates Bare.ground and Shrub.cover
fitvX2 <- gllvm(y, X, formula = ~ Bare.ground + Shrub.cover,
 family = "negative.binomial")
ordiplot(fitvX2)
coefplot.gllvm(fitvX2)


# Use 5 initial runs and pick the best one
fitvX_5 <- gllvm(y, X, formula = ~ Bare.ground + Shrub.cover,
 family = "negative.binomial", control.start=list(n.init = 5, jitter.var = 0.1))
ordiplot(fitvX_5)
coefplot.gllvm(fitvX_5)



## Example 3: Data in long format
# Reshape data to long format:
datalong <- reshape(data.frame(cbind(y,X)), direction = "long",
                   varying = colnames(y), v.names = "y")
head(datalong)
#>     Bare.ground Canopy.cover Shrub.cover Volume.lying.CWD Feral.mammal.dung
#> 1.1        9.00          3.5        0.50       0.03253833              0.20
#> 2.1        6.00          0.0        0.00       0.01202944              0.65
#> 3.1        9.25          0.0        8.10       0.00804195              0.00
#> 4.1       23.25          0.0        0.25       0.00000000              0.20
#> 5.1        8.75          0.0        3.35       0.00993944              0.35
#> 6.1       32.25          0.0        0.00       0.03142221              1.00
#>     time y id
#> 1.1    1 0  1
#> 2.1    1 0  2
#> 3.1    1 0  3
#> 4.1    1 4  4
#> 5.1    1 2  5
#> 6.1    1 0  6
fitvLong <- gllvm(data = datalong, formula = y ~ Bare.ground + Shrub.cover,
               family = "negative.binomial")

## Example 4: Fourth corner model
# Fit fourth corner model with two latent variables
fitF1 <- gllvm(y = y, X = X, TR = TR, family = "negative.binomial")
coefplot.gllvm(fitF1)

# Fourth corner can be plotted also with next lines
#fourth = fitF1$fourth.corner
#library(lattice)
#a = max( abs(fourth) )
#colort = colorRampPalette(c("blue","white","red"))
#plot.4th = levelplot(t(as.matrix(fourth)), xlab = "Environmental Variables",
#              ylab = "Species traits", col.regions = colort(100),
#              at = seq( -a, a, length = 100), scales = list( x = list(rot = 45)))
#print(plot.4th)

# Specify model using formula
fitF2 <- gllvm(y = y, X = X, TR = TR,
 formula = ~ Bare.ground + Canopy.cover * (Pilosity + Webers.length),
 family = "negative.binomial")
ordiplot(fitF2)

coefplot.gllvm(fitF2)


## Include species specific random slopes to the fourth corner model
fitF3 <- gllvm(y = y, X = X, TR = TR,
 formula = ~ Bare.ground + Canopy.cover * (Pilosity + Webers.length),
 family = "negative.binomial", randomX = ~ Bare.ground + Canopy.cover, 
 control.start = list(n.init = 3))
ordiplot(fitF3)

coefplot.gllvm(fitF3)



## Example 5: Fit Tweedie model
# Load coral data
data(tikus)
ycoral <- tikus$abund
# Let's consider only years 1981 and 1983
ycoral <- ycoral[((tikus$x$time == 81) + (tikus$x$time == 83)) > 0, ]
# Exclude species which have observed at less than 4 sites
ycoral <- ycoral[-17, (colSums(ycoral > 0) > 4)]
# Fit Tweedie model for coral data (this line may take few minutes to run)
fit.twe <- gllvm(y = ycoral, family = "tweedie", method = "LA")
ordiplot(fit.twe)


## Example 6: Random row effects
fitRand <- gllvm(y, family = "negative.binomial", row.eff = "random")
ordiplot(fitRand, biplot = TRUE)

# }