Qualitative data can be challenging to analyze and interpret effectively. In this workshop, SBU Libraries' Data Literacies Lead, Ahmad Pratama will show you how to extract meaningful insights from textual data, including understanding sentiment trends. Learn to explore qualitative data with Python using word clouds, basic natural language processing (NLP) techniques, and lexicon-based sentiment analysis with VADER.
RSVP via link: https://t.e2ma.net/click/t70ivh/5wwlu4oe/hy5q96
Nam Nguyen

4-5pm, Dec 17 2020

https://stonybrook.zoom.us/j/94214254415?pwd=K1VoQml4cFdlVW51VW41dWtid2tJdz09



The molecular mechanisms and functions in complex biological systems
currently remain elusive. Recent high-throughput techniques, such as
next-generation sequencing, have generated a wide variety of
multiomics datasets that enable the identification of biological
functions and mechanisms via multiple facets. However, integrating
these large-scale multiomics data and discovering functional insights
are, nevertheless, challenging tasks. To address these challenges,
machine learning has been broadly applied to analyze multiomics. In
particular, multiview learning is more effective than previous
integrative methods for learning data's heterogeneity and revealing
cross-talk patterns. Although it has been applied to various contexts,
such as computer vision and speech recognition, multiview learning has
not yet been widely applied to biological data--specifically,
multiomics data. Therefore, we have developed a framework called
multiview empirical risk minimization (MV-ERM) for unifying multiview
learning methods (Nguyen, et al., PLoS Computational Biology, 2020).
MV-ERM enables potential applications to understand multiomics
including genomics, transcriptomics, and epigenomics, in an aim to
discover the functional and mechanistic interpretations across omics.
Based on MV-ERM, we have developed the following methods:
ManiNetCluster, Varmole and ECMarker.



(1) ManiNetCluster (Nguyen, et al., BMC Genomics, 2019) is a manifold
learning method which simultaneously aligns and clusters gene networks
(e.g., co-expression) to systematically reveal the links of genomic
function between different phenotypes. Specifically, ManiNetCluster
employs manifold alignment to uncover and match local and non-linear
structures among networks, and identifies cross-network functional
links. We demonstrated that ManiNetCluster better aligns the
orthologous genes from their developmental expression profiles across
model organisms than state-of-the-art methods. This indicates the
potential non-linear interactions of evolutionarily conserved genes
across species in development. Furthermore, we applied ManiNetCluster
to time series transcriptome data measured in the green alga
Chlamydomonas reinhardtii to discover the genomic functions linking
various metabolic processes between the light and dark periods of a
diurnally cycling culture;



(2) Varmole (Nguyen, et al., Bioinformatics, 2020) is an interpretable
deep learning method that simultaneously reveals genomic functions and
mechanisms while predicting phenotype from genotype. In particular,
Varmole embeds multi-omic networks into a deep neural network
architecture and prioritizes variants, genes and regulatory linkages
via biological drop-connect without needing prior feature selections.
With an application to schizophonia, we demonstrate that Varmole
provides an effective alternative for recent statistical methods that
associate functional omic data (e.g. gene expression) with genotype
and phenotype and that link variants to individual genes in population
studies such as genome-wide association study;



(3) ECMarker (Jin*, Nguyen*, et al., Bioinformatics, 2020) is an
interpretable and scalable machine learning model that predicts gene
expression biomarkers for disease phenotypes and simultaneously
reveals underlying regulatory mechanisms. Particularly, ECMarker is
built on the integration of semi- and discriminative- restricted
Boltzmann machines, a neural network model for classification allowing
lateral connections at the input gene layer. With application to the
gene expression data of non-small cell lung cancer (NSCLC) patients,
we found that ECMarker not only achieved a relatively high accuracy
for predicting cancer stages but also identified the biomarker genes
and gene networks implying the regulatory mechanisms in lung cancer
development.



Finally, we propose a novel multiview learning method, Malignomics, to
predict phenotypes from heterogeneous multi-omic features. Malignomics
will first align multi-omic features by deep manifold alignment onto a
common latent space, better predicting nonlinear relationships across
omics. This deep alignment aims to preserve both global consistency
and local smoothness across omics and reveal higher-order nonlinear
interactions (i.e., manifolds) among cross-omic features. Second, it
uses these manifold structures to regularize the classifiers for
predicting phenotypes. This manifold-regularization allows
highlighting cross-omic feature manifolds and prioritizing the
features and interactions for the phenotypes. The prioritized
multi-omic features will further reveal underlying phenotypic
functions and mechanisms and thus enhance the biological
interpretation of Malignomics. We will apply Malignomics to
multi-omics data in neuropsychiatric disorders, and prioritize gene
regulatory networks linking risk variants, regulatory elements, and
genes for the disorders. We will also compare Malignomics with the
state-of-the-arts, and investigate how the manifold regulation will
potentially improve understanding of multi-omics functions and
predicting diseases.
The overall purpose of this seminar is to bring together people with interests in Computer Vision theory and techniques and to examine current research issues. This course will be appropriate for people who already took a Computer Vision graduate course or already had research experience in Computer Vision. To enroll in this course, you must either: (1) be in the PhD program or (2) receive permission from the instructors.

Each seminar will consist of multiple short talks (around 10 minutes) by multiple people. Students can register for 1 credit for CSE 656. Registered students must attend and present a minimum of 2 or 3 talks. Everyone else is welcome to attend. Fill in https://forms.gle/pCVXovgfMfQwGqG38 to subscribe to our mailing list for further announcement.

You are cordially invited to attend the biweekly Brookhaven AI Mixer (BAM). BAM includes three short talks on AI research happening at BNL, followed by an open mixer over coffee and snacks for everyone to network and discuss all things AI. The first half hour will consist of presentations that will be available via ZOOM, and the second half hour will be for in person only networking.

Join us every other Tuesday at noon in CDSD's Training Room (building 725, 2nd floor) to learn about interesting AI methods and applications, engage with potential collaborators, prepare for pending FASST funding calls, and build a community of AI for Science at BNL.

Tuesday, January 7, 2025, 12:00 pm -- CDS, Bldg. 725, Training Room

Speakers

Maria Zawadowicz, EBNN--ML for Atmospheric Aerosol Research

Mohammad Atif, CDS--An Extensible Digital Twin Framework

Guang Zhao, CDS--Pareto Prompt Optimization

Join ZoomGov Meeting: https://bnl.zoomgov.com/j/1615289117?pwd=Hqkbj9itxWrFnkhZ8rQXHPInO2gxdF.1

Meeting ID: 161 528 9117
Passcode: 991382

Please join us for the next CSE 600 Seminar this Friday, October 11th, at 2:30pm in New Computer Science 120 given by Assistant Professor Mohammad Javad Amiri. Abstract: Today's distributed transaction processing systems must deal with untrustworthy environments where multiple mutually distrustful entities communicate with each other, and maintain data on untrusted infrastructure. Byzantine Fault-Tolerant (BFT) protocols have recently been extensively used by distributed transaction processing systems to establish consensus on the order of transactions. However, the proliferation of different BFT protocols has made it difficult to navigate the BFT landscape, let alone determine the protocol that best meets application needs. Moreover, as novel smart contracts, modern hardware, and new cloud platforms arise, future-proof distributed transaction processing systems need to be designed with full-stack adaptivity in mind. This talk presents our vision for a reinforcement learning (RL)-based distributed transaction processing system that adjusts effectively in real-time to changing fault scenarios and workloads.
TITLE: Sampling Using Langevin Diffusions Beyond the Worst-Case by Andrej Risteski of CMU


ABSTRACT: Many tasks involving generative models involve being able to sample from distributions parametrized as p(x) = e^{-f(x)}/Z where Z is the normalizing constant, for some function f whose values and gradients we can query. This mode of access to f is natural -- for instance sampling from posteriors in latent-variable models. Classical results show that a natural random walk, Langevin diffusion, mixes rapidly when f is convex. Unfortunately, even in simple examples, the applications listed above will entail working with functions f that are nonconvex.

We exhibit instances where Langevin diffusion (combined with other tools) can provably be shown to mix rapidly in instances of relevance in practice: distributions p that are multimodal, as well as distributions p that have a natural manifold structure on their level sets. 
Abstract: Jailbreak attacks circumvent LLMs' built-in safeguards by concealing harmful queries within adversarial prompts. While most existing defenses attempt to mitigate the effects of adversarial prompts, they often prove inadequate as adversarial prompts can take arbitrary, adaptive forms. This paper introduces RobustKV, a novel jailbreak defense that takes a fundamentally different approach by selectively removing critical tokens of harmful queries from key-value (KV) caches. Intuitively, for an adversarial prompt to be effective, its tokens must achieve sufficient `importance' (measured by attention scores), which consequently lowers the importance of tokens in the concealed harmful query. Therefore, by carefully evicting the KVs of low-ranked tokens, RobustKV minimizes the harmful query's presence in the KV cache, thus preventing the LLM from generating informative responses. Extensive evaluation using benchmark datasets and models demonstrates that RobustKV effectively counters state-of-the-art jailbreak attacks while maintaining the LLM's performance on benign queries. Notably, RobustKV creates an interesting effectiveness-evasiveness dilemma for the adversary, leading to its robustness against adaptive attacks.

Speaker: Tanqiu Jiang

Where: NCS 220 and Zoom (https://stonybrook.zoom.us/j/6406956411)