Nam Nguyen: Thesis Proposal: Interpretable Multiview Learning for Understanding Functional Multi-omics

Event Description

Nam Nguyen

4-5pm, Dec 17 2020

https://stonybrook.zoom.us/j/94214254415?pwd=K1VoQml4cFdlVW51VW41dWtid2tJdz09



The molecular mechanisms and functions in complex biological systems
currently remain elusive. Recent high-throughput techniques, such as
next-generation sequencing, have generated a wide variety of
multiomics datasets that enable the identification of biological
functions and mechanisms via multiple facets. However, integrating
these large-scale multiomics data and discovering functional insights
are, nevertheless, challenging tasks. To address these challenges,
machine learning has been broadly applied to analyze multiomics. In
particular, multiview learning is more effective than previous
integrative methods for learning data's heterogeneity and revealing
cross-talk patterns. Although it has been applied to various contexts,
such as computer vision and speech recognition, multiview learning has
not yet been widely applied to biological data--specifically,
multiomics data. Therefore, we have developed a framework called
multiview empirical risk minimization (MV-ERM) for unifying multiview
learning methods (Nguyen, et al., PLoS Computational Biology, 2020).
MV-ERM enables potential applications to understand multiomics
including genomics, transcriptomics, and epigenomics, in an aim to
discover the functional and mechanistic interpretations across omics.
Based on MV-ERM, we have developed the following methods:
ManiNetCluster, Varmole and ECMarker.



(1) ManiNetCluster (Nguyen, et al., BMC Genomics, 2019) is a manifold
learning method which simultaneously aligns and clusters gene networks
(e.g., co-expression) to systematically reveal the links of genomic
function between different phenotypes. Specifically, ManiNetCluster
employs manifold alignment to uncover and match local and non-linear
structures among networks, and identifies cross-network functional
links. We demonstrated that ManiNetCluster better aligns the
orthologous genes from their developmental expression profiles across
model organisms than state-of-the-art methods. This indicates the
potential non-linear interactions of evolutionarily conserved genes
across species in development. Furthermore, we applied ManiNetCluster
to time series transcriptome data measured in the green alga
Chlamydomonas reinhardtii to discover the genomic functions linking
various metabolic processes between the light and dark periods of a
diurnally cycling culture;



(2) Varmole (Nguyen, et al., Bioinformatics, 2020) is an interpretable
deep learning method that simultaneously reveals genomic functions and
mechanisms while predicting phenotype from genotype. In particular,
Varmole embeds multi-omic networks into a deep neural network
architecture and prioritizes variants, genes and regulatory linkages
via biological drop-connect without needing prior feature selections.
With an application to schizophonia, we demonstrate that Varmole
provides an effective alternative for recent statistical methods that
associate functional omic data (e.g. gene expression) with genotype
and phenotype and that link variants to individual genes in population
studies such as genome-wide association study;



(3) ECMarker (Jin*, Nguyen*, et al., Bioinformatics, 2020) is an
interpretable and scalable machine learning model that predicts gene
expression biomarkers for disease phenotypes and simultaneously
reveals underlying regulatory mechanisms. Particularly, ECMarker is
built on the integration of semi- and discriminative- restricted
Boltzmann machines, a neural network model for classification allowing
lateral connections at the input gene layer. With application to the
gene expression data of non-small cell lung cancer (NSCLC) patients,
we found that ECMarker not only achieved a relatively high accuracy
for predicting cancer stages but also identified the biomarker genes
and gene networks implying the regulatory mechanisms in lung cancer
development.



Finally, we propose a novel multiview learning method, Malignomics, to
predict phenotypes from heterogeneous multi-omic features. Malignomics
will first align multi-omic features by deep manifold alignment onto a
common latent space, better predicting nonlinear relationships across
omics. This deep alignment aims to preserve both global consistency
and local smoothness across omics and reveal higher-order nonlinear
interactions (i.e., manifolds) among cross-omic features. Second, it
uses these manifold structures to regularize the classifiers for
predicting phenotypes. This manifold-regularization allows
highlighting cross-omic feature manifolds and prioritizing the
features and interactions for the phenotypes. The prioritized
multi-omic features will further reveal underlying phenotypic
functions and mechanisms and thus enhance the biological
interpretation of Malignomics. We will apply Malignomics to
multi-omics data in neuropsychiatric disorders, and prioritize gene
regulatory networks linking risk variants, regulatory elements, and
genes for the disorders. We will also compare Malignomics with the
state-of-the-arts, and investigate how the manifold regulation will
potentially improve understanding of multi-omics functions and
predicting diseases.

Date Start

Date End