Network based latent dirichlet subtype analysis (ver. 20190920)

nebula(
  data,
  modtype,
  E,
  H,
  modeta,
  nu,
  alpha,
  lam,
  alpha_sigma = 1,
  beta_sigma = 1,
  alpha_p = 1,
  beta_p = 1,
  mu0 = 0,
  sig0 = 20,
  pr0 = 0.5,
  binit = NULL
)

Arguments

data

list of M data matrices, where each matrix is n samples by p_m features for modality M

modtype

M-length vector of feature types for M modalities. Currently supports continuous(=0) and binary(=1)

E

e by 4 matrix, each tuple (row) of which represents an edge; (m1,j1,m2,j2) variable j1 of modality m1 is connected to variable j2 of modality m2

H

the number of clusters to be fit

modeta

length M vector of sparsity parameters for M modalities

nu

smoothness parameter for gamma's

alpha

concentration parameter for dirichlet process

lam

shrinkage parameter for means of selected continuous features

alpha_sigma

shape parameter of the prior of residual variance(sigma^2) (default = 1, i.e. noninformative)

beta_sigma

rate parameter of the prior of residual variance(sigma^2) (default = 1, i.e. noninformative)

alpha_p

first shape parameter of the prior of the 'active' probabilities(p_hj) of binary features (default = 1, i.e. noninformative)

beta_p

second shape parameter of the prior of the 'active' probabilities(p_hj) of binary features (default = 1, i.e. noninformative)

mu0

mean of the non-selected continuous features (default is 0)

sig0

variance of the non-selected continuous features (default is 20)

pr0

'active' probability of the non-selected binary features (default is 0.5)

binit

n by H initial matrix of B, exp(B_ih) is proportional to Pr(z_i=h). If NULL (default), random numbers are filled in.

Value

A list containing clustering assignments, variable selection, and posterior probabilities

  • clustering cluster assignment

  • defvar list of M matrices; each matrix is p_m by H indicating the variable j in modality m is a defining variable for the cluster h.

  • clus_pr n by H matrix containing the probability that the subject i belongs to the cluster h.

  • defvar_pr list of M matrices; each matrix is p_m by H containing the probability that the variable j in modality m is a defining variable for the cluster h.

  • def_m list of M matrices; each matrix is p_m by H containing the mean of the variable j as a defining variable for the cluster h. (continuous variable only)

  • def_lpr list of M matrices: each matrix is p_m by H containing the log probabilities of the variable j being 'active' and a defining variable for the cluster h. (binary variable only)

  • iter the number of iterations until the algorithm converges.

  • param A list of the input parameters used in the clustering solution.

Author

Changgee Chang

Examples

if (FALSE) { res <- nebula( data = colon$modal, modtype = c(0, 1), E = colon$network, H = 3, modeta = c(1, 0.2), nu = 1, alpha = 1, lam = 1, alpha_sigma = 10, beta_sigma = 10, alpha_p = 1, beta_p = 1, ) }