New York Scientific Data Summit (NYSDS) is a premier annual conference that brings together researchers and thought leaders from academia, national labs and industry to exchange ideas and foster collaboration focused on data-driven science and technology. Co-hosted by Brookhaven National Laboratory and the Institute for Advanced Computational Science (IACS) at Stony Brook University, NYSDS 2025 will take place on September 11-12, 2025, in the SUNY Global Center in New York City.

NYSDS 2025 will spotlight artificial intelligence (AI), machine learning (ML) and robotics - fields currently at a pivotal point with transformative impacts on science and technology. From accelerating computationally demanding simulations to discerning signals from noisy data, AI/ML has become an integral part of the scientific workflows. Despite many advances, challenges remain to ensure that AI/ML applications are reliable, explainable and trustworthy.

Robotics, a growing field that couples AI with physically actuated mechanical bodies, has seen increased interest in areas spanning science, technology and manufacturing. The need for real-time decision-making and control, along with the intricate morphology of robots, makes robotics an intriguing application of AI, advanced computing and optimization.


This NYSDS 2025 is open to the public. To be eligible to attend, all participants must register online by August 30, 2025. For questions or assistance with registering, please contact the Summit Coordinator.

Register here.

Abstract:

It is known that models like large language models (LLMs) can often suggest colloquial plans given verbal descriptions of tasks, yet they are unable to reliably provide executable and verifiable plans given formally specified environments. In this talk, I will discuss a strand of efforts to have LLMs generate accurate and explainable plans in textual simulations. Instead of directly generating the plan or actions, LLMs are prompted to generate Planning Domain Definition Language (PDDL) that specifies the environment (domain file) and the task (problem file), which can then be deterministically solved with an off-the-shelf planner. In a 3-phase study, my collaborators and I first observed that it is possible but very challenging for LLMs to generate long-form code such as PDDL domain and problem files given textual specifications. Next, we devise methodologies for LLMs to iteratively generate and refine problem files while exploring a partially-observed, simulated, textual environment. Finally, we show that domain files are even more difficult to generate correctly, even on well-established planning tasks such as BlocksWorld. Finally, I will discuss ongoing efforts to improve said ability of structured generation and promising frontiers to explore.

Bio:
Li Harry Zhang is an assistant professor at Drexel University, focusing on Natural Language Processing (NLP) and artificial intelligence (AI). He obtained his PhD degree from the University of Pennsylvania advised by Prof. Chris Callison-Burch. Prior, he obtained his Bachelor's degree at the University of Michigan mentored by Prof. Rada Mihalcea and Prof. Dragomir Radev. His current research uses large language models (LLMs) to reason and plan via symbolic and structured representations. He has published more than 20 peer-reviewed papers in NLP and AI conferences, such as ACL, EMNLP, and AACL, that have been cited more than 1,000 times. He also consistently serves as Area Chair, Session Chair, and reviewer in those venues. Being a musician, producer, and content creator having over 50,000 subscribers, he is also passionate in the research of AI music and creativity.

Virtual Job Fair for New Stony Brook Graduates & Experienced Alumni Using a platform called Career Fair Plus, participants will be able to schedule 10-minute video meetings with participating employers of interest to them. Recent graduates and alumni can register and learn more about how the fair will be run by registering on Handshake.
The overall purpose of this seminar is to bring together people with interests in Computer Vision theory and techniques and to examine current research issues. This course will be appropriate for people who already took a Computer Vision graduate course or already had research experience in Computer Vision. To enroll in this course, you must either: (1) be in the PhD program or (2) receive permission from the instructors.

Each seminar will consist of multiple short talks (around 10 minutes) by multiple people. Students can register for 1 credit for CSE 656. Registered students must attend and present a minimum of 2 or 3 talks. Everyone else is welcome to attend. Fill in https://forms.gle/pCVXovgfMfQwGqG38 to subscribe to our mailing list for further announcement.
West Campus - SAC- Student Activities Center - Ballrooms A & B 100 Nicolls Road Stony Brook NY 11794 Job Fair.jpg The Career Center invites Alumni Employers and Job Seekers to the IT/Computer Science Job and Internship Fair this spring. Job Seekers: A job fair is an opportunity for you to present yourself professionally in person to a potential employer, while showcasing your communication skills. Get more information Alumni Employers: Held in both the fall and spring semesters, this event is ideal for employers looking to fill internship, co-op, part-time and full-time opportunities in the field of information technology (i.e. Software Engineering, Network Administration, Web Development, etc.). Register here to recruit top SBU talent.
Nam Nguyen

4-5pm, Dec 17 2020

https://stonybrook.zoom.us/j/94214254415?pwd=K1VoQml4cFdlVW51VW41dWtid2tJdz09



The molecular mechanisms and functions in complex biological systems
currently remain elusive. Recent high-throughput techniques, such as
next-generation sequencing, have generated a wide variety of
multiomics datasets that enable the identification of biological
functions and mechanisms via multiple facets. However, integrating
these large-scale multiomics data and discovering functional insights
are, nevertheless, challenging tasks. To address these challenges,
machine learning has been broadly applied to analyze multiomics. In
particular, multiview learning is more effective than previous
integrative methods for learning data's heterogeneity and revealing
cross-talk patterns. Although it has been applied to various contexts,
such as computer vision and speech recognition, multiview learning has
not yet been widely applied to biological data--specifically,
multiomics data. Therefore, we have developed a framework called
multiview empirical risk minimization (MV-ERM) for unifying multiview
learning methods (Nguyen, et al., PLoS Computational Biology, 2020).
MV-ERM enables potential applications to understand multiomics
including genomics, transcriptomics, and epigenomics, in an aim to
discover the functional and mechanistic interpretations across omics.
Based on MV-ERM, we have developed the following methods:
ManiNetCluster, Varmole and ECMarker.



(1) ManiNetCluster (Nguyen, et al., BMC Genomics, 2019) is a manifold
learning method which simultaneously aligns and clusters gene networks
(e.g., co-expression) to systematically reveal the links of genomic
function between different phenotypes. Specifically, ManiNetCluster
employs manifold alignment to uncover and match local and non-linear
structures among networks, and identifies cross-network functional
links. We demonstrated that ManiNetCluster better aligns the
orthologous genes from their developmental expression profiles across
model organisms than state-of-the-art methods. This indicates the
potential non-linear interactions of evolutionarily conserved genes
across species in development. Furthermore, we applied ManiNetCluster
to time series transcriptome data measured in the green alga
Chlamydomonas reinhardtii to discover the genomic functions linking
various metabolic processes between the light and dark periods of a
diurnally cycling culture;



(2) Varmole (Nguyen, et al., Bioinformatics, 2020) is an interpretable
deep learning method that simultaneously reveals genomic functions and
mechanisms while predicting phenotype from genotype. In particular,
Varmole embeds multi-omic networks into a deep neural network
architecture and prioritizes variants, genes and regulatory linkages
via biological drop-connect without needing prior feature selections.
With an application to schizophonia, we demonstrate that Varmole
provides an effective alternative for recent statistical methods that
associate functional omic data (e.g. gene expression) with genotype
and phenotype and that link variants to individual genes in population
studies such as genome-wide association study;



(3) ECMarker (Jin*, Nguyen*, et al., Bioinformatics, 2020) is an
interpretable and scalable machine learning model that predicts gene
expression biomarkers for disease phenotypes and simultaneously
reveals underlying regulatory mechanisms. Particularly, ECMarker is
built on the integration of semi- and discriminative- restricted
Boltzmann machines, a neural network model for classification allowing
lateral connections at the input gene layer. With application to the
gene expression data of non-small cell lung cancer (NSCLC) patients,
we found that ECMarker not only achieved a relatively high accuracy
for predicting cancer stages but also identified the biomarker genes
and gene networks implying the regulatory mechanisms in lung cancer
development.



Finally, we propose a novel multiview learning method, Malignomics, to
predict phenotypes from heterogeneous multi-omic features. Malignomics
will first align multi-omic features by deep manifold alignment onto a
common latent space, better predicting nonlinear relationships across
omics. This deep alignment aims to preserve both global consistency
and local smoothness across omics and reveal higher-order nonlinear
interactions (i.e., manifolds) among cross-omic features. Second, it
uses these manifold structures to regularize the classifiers for
predicting phenotypes. This manifold-regularization allows
highlighting cross-omic feature manifolds and prioritizing the
features and interactions for the phenotypes. The prioritized
multi-omic features will further reveal underlying phenotypic
functions and mechanisms and thus enhance the biological
interpretation of Malignomics. We will apply Malignomics to
multi-omics data in neuropsychiatric disorders, and prioritize gene
regulatory networks linking risk variants, regulatory elements, and
genes for the disorders. We will also compare Malignomics with the
state-of-the-arts, and investigate how the manifold regulation will
potentially improve understanding of multi-omics functions and
predicting diseases.
Abstract : Humans reason about everyday situations by making commonsense-based inferences, derived both from explicitly stated information and implicit, unstated knowledge. In this thesis, I investigate whether NLP models have different aspects of causal knowledge about events and how to improve their understanding of narratives and plans.
Answering questions about why people perform actions in a narrative can test whether NLP systems contain and can effectively apply causal knowledge about events. I introduce TellMeWhy, a dataset concerning why characters in short narratives perform the actions described. An evaluation of then SOTA finetuned models show that they are far worse than humans. To improve models, it is important to understand what aspects of causal knowledge they need and how to best use external sources to inject this knowledge. In KnowWhy, I analyze different ways of injecting knowledge into models, which is difficult since we do not know apriori what type of knowledge will be needed to answer a question, hence requiring a ranking model to pick the most important inference. Results show that this retrieved knowledge helps models of all sizes, thereby improving their understanding of narratives.
Next, I study whether models can reason about causal aspects of plans. I focus on testing whether they understand the underlying causal dependencies reflected in the temporal order of a plan's steps. I introduce CAT-Bench, and find that SOTA models are underwhelming, and that model answers are not consistent across questions about the same step pairs. In their current state, these models cannot yet reliably be used for complex user-facing tasks. I then measure contemporary models' ability to perform user-facing and user-centric plan customization. I introduce the use of semi-symbolic edits in large language model (LLM) based agents and test several multi-LLM-agent architectures for plan customization. While LLMs still lack the ability to understand complex customization hints, my results suggest that LLM-based architectures may be worth exploring further for other customization applications. Finally, I distill complex reasoning capabilities into small language models (SLMs) using synthetic data that reflects a decomposition-then-editing process for plan customization. I demonstrate that explicitly teaching this latent causal reasoning significantly improves the quality of SLM-generated customizations. Overall, my work has improved how well NLP models understand complex reasoning associated with events in different contexts.

Speaker: Yash Kumar Lal

Location: NCS 220 or Zoom https://stonybrook.zoom.us/j/95849648243?pwd=dgPpZtDpgwQrK9z1SaPpNbBifaorzk.1
Abstract: Modern decision-making increasingly relies on complex data, imperfect models, and limited domain expertise--yet decisions must still be made with confidence and accountability. This talk presents a research perspective on visual analytics as a bridge between data, models, and human judgment. Through three case studies spanning public-health risk analysis, multivariate scientific visualization, and causal model auditing with large language models, I will show how interactive visualization can reveal structure in high-dimensional data, support reasoning under uncertainty, and help humans critically assess both statistical and AI-generated explanations. Together, these examples illustrate how visual analytics enables users not only to explore data, but to form, challenge, and refine beliefs that underpin scientific and societal decisions.

Bio: Klaus Mueller received his Ph.D. in Computer Science from The Ohio State University in 1998. He is a Professor in the Department of Computer Science at Stony Brook University and a Senior Scientist at the Computational Science Initiative at Brookhaven National Laboratory. He currently serves as the Acting Chair of the Department of Technology and Society at Stony Brook. From 2012 to 2015, he was the Founding Chair of the Computer Science Department at SUNY Korea, where he also served as Vice President for Academic Affairs and Finance for two years.
His research interests span visual analytics, explainable AI, machine learning and data science, human-centered responsible AI, fairness, belief modeling and personalized communication, virtual and augmented reality, and computational and medical imaging. Dr. Mueller received the U.S. National Science Foundation Early Career Award in 2001, the SUNY Chancellor's Award for Excellence in Scholarship and Creative Activity in 2011, and the Meritorious Service Certificate and Golden Core Award of the IEEE Computer Society in 2016. In 2018, he was inducted into the U.S. National Academy of Inventors.
To date, he has authored more than 300 peer-reviewed journal and conference papers, which have been cited over 15,000 times. He is a frequent speaker at international conferences, has organized or participated in 18 tutorials, chaired the IEEE Visualization Conference in 2009, served as elected Chair of the IEEE Technical Committee on Visualization and Computer Graphics (VGTC) from 2012-2015, and was Editor-in-Chief of IEEE Transactions on Visualization and Computer Graphics from 2019-2022. He is a Fellow of the IEEE.

Location: NCS 120

Abstract: As intelligent systems become more integrated into human environments, fostering trustworthy human-AI collaboration presents a pressing challenge. In this talk, I examine the interplay between an agent's performance and social dynamics in shaping trust in human-AI interactions. My approach combines testbed development, behavioral prototyping, and user study design to create controlled experimental setups that capture real-world interaction complexities, such as ambiguity, multi-agent dynamics, and conflicting goals.

I illustrate this with a recent VR study on multi-user interaction with an autonomous vehicle (AV). Moving beyond dyadic interactions, the study probes human perspectives from the roles of a pedestrian, driver, and AV passenger, all interacting with the AV simultaneously at an ambiguous all-way stop sign intersection. We compare interactions with efficient and prosocial AV behavior strategies, revealing diverging trust perceptions and preferences across user roles. These insights inform a broader research trajectory focused on balancing performance with social considerations in designing trustworthy human-AI collaborations.

Bio: JiHyun Jeong is a postdoctoral researcher at Cornell University working on human-computer interaction and human-robot interaction. Her research develops prototypes and methods to explore performance and social factors that influence collaboration and trust between humans and artificial agents. She holds a Ph.D. and MPS in Information Science from Cornell University, and a BSc in Computer Science and Engineering from Korea University. She is a recipient of an honorable mention for best paper at DIS.

Zoom: https://stonybrook.zoom.us/j/98738234619?pwd=djJFQXBWbkpmblZDT25zNlVMYWpCQT09

Meeting ID: 987 3823 4619
Passcode: 474618

The annual conference on Neural Information Processing Systems is a multi-track interdisciplinary annual meeting that includes invited talks, demonstrations, symposia, and oral and poster presentations of refereed papers. Along with the conference is a professional exposition focusing on machine learning in practice, a series of tutorials, and topical workshops that provide a less formal setting for the exchange of ideas.

For more information and registration, visit the official website.