HARMONIES is a Bayesian frequentist hybrid method that infers the network based on microbiome count data. HARMONIES consists of two major steps:
Please refer to our paper for more details of HARMONIES: “HARMONIES: A Hybrid Approach for Microbiome Networks Inference via Exploiting Sparsity”, Jiang S, et al., 2019
The HARMONIES algorithm is implemented in R
and Rcpp
. Users who are not famaliar with basic R programming are suggested to use our web server (update soon) to run HARMONIES with a user-friendly GUI.
First we install the package HARMONIES
through Github
library(devtools)
install_github("shuangj00/HARMONIES", subdir = "pkg" )
library(HARMONIES)
R (version 3.5.0 or later)
R Packages
This section provides a quick tutorial of implementing HARMONIES on a simulated dataset. The package HARMONIES
includes functions to generate zero-inflated count matrix and to visualize the underlying network structure of the simulated data.
Assume we have \(p = 30\) taxa. First we get an adjcant matrix and the true underlying partial correlation matrix.
graph.info = get.adjmatrix(p = 30, edge.prob = 0.1)
network.information = simu.network(adj.matrix = graph.info$adj.matrix)
Next we visualize the network structure
partical.correlation = network.information$Pcorr
colnames(partical.correlation) = NULL
rownames(partical.correlation) = NULL
visualize.networkD3(pcor.mat = partical.correlation)
Notice that a green edge represents a positive partial correlation while a red edge represents a negative partial correlation.
Given this underlying network structure (the partial correlation and the corresponding variance-covariance matrices), we simulate the overdispersed count data with zero inflation from a Dirichlet Multinomial (DM) model.
covar.mat = network.information$Sigma
count.info = get.countmat(n = 200, Sigma = covar.mat)
count.matrix = count.info$Y
Here, count.info
contains the following information:
Y
: the observed count matrix (with zero-inflation)Y.full
: the true count matrix (without the added zeros)Mu.mat
: the underlying multivariate normal data that used as a parameter in the DM data generationSigma
: the variance-covariance matrix of the multivariate normal dataGiven the observed count data, the main function HARMONIES
consists the following steps: 1. Fit the ZINB-DPP model to obtain the normalized abundance. 2. Infer a sparse network using graphical lasso based on the normalized abundances and select the optimal tuning parameter in grahical lasso by Stability Approach to Regularization Selection (StARS).
The input of HARMONIES
includes the following:
count.matrix
The \(n-\)by\(-p\) (i.e. sample-by-taxon) count matrixphenotype
A numerical vector indicating the phenotype of the \(n\) samples. If only have one sample group, use a vector of zeros. If have two sample groups, use a vector of zeros and ones. The current method can handle two phenotyeps at mostN.mcmc
Number of MCMC iterations with first half discarded as burn-in. Recommonded value is at leaset \(10,000\) (default)b
Shape hyper parameter of the variance paraemeter in the ZINB-DPP model (\(b = 1\) by default)h
Scale hyper parameter of the variance paraemeter in the ZINB-DPP model (\(h = 10\) by default)sparsity.cutoff
A threshold between \(0-1\). Taxa with proportions of observed zeros larger than the threshold will be dropped for the network inference (\(=0.5\) by default)beta.stars
The stability parameter used in StARS method (\(=0.05\) by default)n.rep
Number of subsamples used in StARS method (\(=20\) by default)bayes.fdr
Bayesian false discovery rate controled in the normalization step (\(=0.05\) by default; recommended range is \(0-0.1\))seed
Random seedIn this toy example, we only have one phenotype, and therefore the phenotype
is a vector of \(0\) with length \(n=200\).
inference.results = HARMONIES(count.matrix = count.matrix,
phenotype = rep(0, 200),
N.mcmc = 10000,
beta.stars = 0.05)
## Loading required package: qgraph
## Registered S3 methods overwritten by 'huge':
## method from
## plot.sim BDgraph
## print.sim BDgraph
## Loading required package: huge
## Loading required package: pulsar
## Loading required package: Rcpp
## Start fitting the ZINB-DPP model. This may take a while...
## 0% has been done
## 10% has been done
## 20% has been done
## 30% has been done
## 40% has been done
## 50% has been done
## 60% has been done
## 70% has been done
## 80% has been done
## 90% has been done
## Estimating the sparse precision matrix...
The output of HARMONIES
includes
partial.corr
: the estimated partial correlationedge.estimation
: the edge information of the edge(s) in the resulted network (source, target, the corresponding partial correlation value)node.estimation
the node information in the resulted network with the following:
We now visualize the estimated newtork structure
visualize.networkD3(pcor.mat = inference.results$partial.corr)
By comparing it with the actually one, we can see all the relatively strong signals are successfully captured by HARMONIES.
In the real data studyof colorectal cancer (CRC), we implemented HARMONIES to handle two phenotype groups.
The real data we study here is published by (Feng et al. 2015). The count matrix consists \(n = 144\) samples with \(p=187\) genus. The \(144\) subjects forms two groups, CRC patients (\(43\) labeled as \(1\)) and nonCRC controls (\(101\) labeled as \(0\)).
HARMONIES achieves a simultaneous modeling of subjects from two groups. This process benenfits our analysis in the following two aspects:
When having two phenotypes, HARMONIES provides the following addtional information for the nodes in the resulted networks:
One can use these additional information as input for other visualization tools, such as Cytoscape, for further visualization and analysis. For example, the following figures are generated by Cytoscape, where the node size is determined by the abundance information, and the nodes (taxa) are named by their taxonomic ranks.
Using Cytoscape to Make Network Plot
Shuang Jiang shuangj@smu.edu, Department of Statistical Science, Southern Methodist University, Dallas, TX 75275
Feng, Qiang, Suisha Liang, Huijue Jia, Andreas Stadlmayr, Longqing Tang, Zhou Lan, Dongya Zhang, et al. 2015. “Gut Microbiome Development Along the Colorectal Adenoma–Carcinoma Sequence.” Nature Communications 6. Nature Publishing Group: 6528.