nebula — nebula • nebula

Network based latent dirichlet subtype analysis (ver. 20190920)

nebula(
  data,
  modtype,
  E,
  H,
  modeta,
  nu,
  alpha,
  lam,
  alpha_sigma = 1,
  beta_sigma = 1,
  alpha_p = 1,
  beta_p = 1,
  mu0 = 0,
  sig0 = 20,
  pr0 = 0.5,
  binit = NULL
)

Arguments

data	list of M data matrices, where each matrix is n samples by p_m features for modality M
modtype	M-length vector of feature types for M modalities. Currently supports continuous(=0) and binary(=1)
E	e by 4 matrix, each tuple (row) of which represents an edge; (m1,j1,m2,j2) variable j1 of modality m1 is connected to variable j2 of modality m2
H	the number of clusters to be fit
modeta	length M vector of sparsity parameters for M modalities
nu	smoothness parameter for gamma's
alpha	concentration parameter for dirichlet process
lam	shrinkage parameter for means of selected continuous features
alpha_sigma	shape parameter of the prior of residual variance(sigma^2) (default = 1, i.e. noninformative)
beta_sigma	rate parameter of the prior of residual variance(sigma^2) (default = 1, i.e. noninformative)
alpha_p	first shape parameter of the prior of the 'active' probabilities(p_hj) of binary features (default = 1, i.e. noninformative)
beta_p	second shape parameter of the prior of the 'active' probabilities(p_hj) of binary features (default = 1, i.e. noninformative)
mu0	mean of the non-selected continuous features (default is 0)
sig0	variance of the non-selected continuous features (default is 20)
pr0	'active' probability of the non-selected binary features (default is 0.5)
binit	n by H initial matrix of B, exp(B_ih) is proportional to Pr(z_i=h). If NULL (default), random numbers are filled in.

Value

A list containing clustering assignments, variable selection, and posterior probabilities

clustering cluster assignment
defvar list of M matrices; each matrix is p_m by H indicating the variable j in modality m is a defining variable for the cluster h.
clus_pr n by H matrix containing the probability that the subject i belongs to the cluster h.
defvar_pr list of M matrices; each matrix is p_m by H containing the probability that the variable j in modality m is a defining variable for the cluster h.
def_m list of M matrices; each matrix is p_m by H containing the mean of the variable j as a defining variable for the cluster h. (continuous variable only)
def_lpr list of M matrices: each matrix is p_m by H containing the log probabilities of the variable j being 'active' and a defining variable for the cluster h. (binary variable only)
iter the number of iterations until the algorithm converges.
param A list of the input parameters used in the clustering solution.

Author

Changgee Chang

Examples


if (FALSE) {
res <- nebula(
         data = colon$modal,
         modtype = c(0, 1),
         E = colon$network,
         H = 3,
         modeta = c(1, 0.2),
         nu = 1,
         alpha = 1,
         lam = 1,
         alpha_sigma = 10,
         beta_sigma = 10,
         alpha_p = 1,
         beta_p = 1,
         )
 }