International Love Data Week is a global event dedicated to celebrating data in all its forms. This year, Stony Brook University is excited to celebrate Love Data Week with a series of 30-minute webinars aimed to promote proficiency with data, showcase innovative data projects, and foster a community of data enthusiasts across campus. Hosted by the Division of Educational & Institutional Effectiveness and facilitated by the Office of Educational Effectiveness, we invite all SBU faculty, staff and students to join in the festivities, learn from colleagues in our campus community, and fall in love with the power of data!

Learn more here.


Abstract: Datalog is a powerful language for expressing recursive computations through rules: Horn clauses in first order logic. Although effective at expressing queries over existential properties, Datalog and many of its popular implementations struggle with queries that involve more complex aggregates, requiring users to apply verbose, non-composable, and/or inefficient workarounds. Recent work on lattice-based datalogs addresses many of these concerns for aggregates that can be encoded as lattices (e.g., min or max), but more general aggregates like count remain problematic. In this talk, I will argue that this is not a fundamental limitation of Datalog, but rather from its model of truth: Both datalog semantics and evaluation rules make heavy use of the fact that insertion is both monotone and idempotent. Once a fact is known to be true, it can not be retracted, nor can further discoveries of the same fact alter its truth. Monotonicity is critical for forward progress under Datalog's ``open world'' model, as it allows us to safely assert the truth of a body. Meanwhile, idempotence makes it easier to reason about evaluation, as we need only guarantee that each head atom will be derived at-least-once. Unfortunately, more general aggregates like sum() are neither idempotent, nor monotone. I will introduce Hedgelog, a strict generalization of Datalog that uses general monoids as a basis for truth. I will show that this generalization remains compatible with Datalog's open world model, how it enables cleaner and more composable datalog programs, and how the underlying monoid relations open the door to interesting datastructure-level optimizations.

Bio: Oliver Kennedy is an associate professor at the University at Buffalo. He earned his PhD from Cornell University in 2011 and now leads the Online Data Interactions (ODIn) lab, which operates at the intersection of databases and programming languages. Oliver is the recipient of an NSF CAREER award, an IEEE Region 1 Technological Innovation Award, UB's Exceptional Scholar Award, and several UB SEAS teaching awards. Oliver is also one of the founding board members of Breadcrumb Analytics. Several of Oliver's papers have been invited to Best of compilations from SIGMOD and VLDB. The ODIn lab is currently exploring (i) how we can leverage database techniques like incremental view maintenance to make compilers faster, (ii) how to make it easier for data scientists to track how sources of uncertainty, ambiguity, and/or bias affect analyses, and (iii) how to streamline the interfaces --- both human and software --- between different tools for data science, like python, sql, and spreadsheets.

Location: NCS 120
University Libraries Present: Qualitative data can be challenging to analyze and interpret effectively. In this workshop, SBU Libraries' Data Literacies Lead, Ahmad Pratama will show you how to extract meaningful insights from textual data, including understanding sentiment trends. Learn to explore qualitative data with Python using word clouds, basic natural language processing (NLP) techniques, and lexicon-based sentiment analysis with VADER.
https://stonybrook.zoom.us/meeting/register/k0r6mPYCRayk2AOGmyd0qw#/registration
The Center of Excellence in Wireless and Information Technology (CEWIT) will host the 16th International Conference on Emerging Technologies for a Smarter World (CEWIT2020) virtually on November 5, 2020. The conference will center on the four major fields which are penetrating our business and personal lives: Machine Learning, Artificial Intelligence, Blockchain and Computational Medicine. For more info visit: https://www.cewit.org/.
Abstract: Artificial Intelligence for Science (AI4Sci) has become a transformative approach in modeling and understanding complex physical systems, encompassing different scales such as atomistic systems and continuum systems. In atomistic systems, AI has shown potential in accelerating simulations, optimizing molecular dynamics, and predicting material and molecular properties through data-driven approaches, enhancing computational efficiency while preserving accuracy. For continuum systems, AI provides powerful tools for solving partial differential equations (PDEs) and learning physical patterns from data, capturing intricate dynamics that govern physical and engineering processes. This work explores AI methods--particularly equivariance for neural networks and neural operators--bridging atomistic and continuum representations. We analyze the implications of incorporating symmetries to improve model robustness and learning efficiency, providing a cohesive AI- driven framework for advancing scientific discovery. The findings aim to underscore the role of AI in enhancing accuracy, applicability, scalability, interopretability, and generalization across scales, from molecular simulations to physical modeling, opening pathways for next- generation applications in computational science. Biography: Wenhan Gao is a third-year Ph.D. student in Applied Mathematics at Stony Brook University, where he works under the supervision of Professor Yi Liu. Wenhan's research focuses on equivariant neural networks, graph neural networks, and AI for partial differential equations. Wenhan's work seeks to leverage the power of symmetries to aid AI models, particularly in fields such as computer vision (image and video generation), physical simulation (modeling climate change), and computational chemistry (drug discovery). He has published papers on the aforementioned topics in leading venues like NeurIPS, Transactions on Machine Learning Research (TMLR), and Journal of Computational Physics (JCP). He also has several preprints under review in leading venues like ICLR and CVPR. In addition to his research, Wenhan has served as a reviewer for top-tier conferences, including ICLR, NeurIPS, ICML, and KDD, and as a lecturer for undergraduate and graduate courses at Stony Brook University. Wenhan was awarded the NeurIPS Travel Award and Excellence in Teaching for Fall 2023.
The coach who led Team USA to four Math Olympiad gold medals shares his blueprint for staying irreplaceable in an AI-driven world.

As artificial intelligence transforms our world, what skills will remain uniquely human? How can we prepare for careers in an automated future?

Join Carnegie Mellon mathematics professor Po-Shen Loh for insights on navigating the AI revolution by embracing our humanity.

Dr. Loh brings a distinctive perspective shaped by his dual expertise: serving as national coach of the USA Mathematical Olympiad team (which has won four gold medals under his leadership) and developing innovative solutions for real-world challenges from pandemic response to educational technology.

Through his nationwide speaking tour that reached 250 audiences across 100 cities, he has refined a practical framework for thriving alongside AI.

In this presentation, Dr. Loh will explore how creative problem-solving, judgment, and communication become more valuable as automation grows -- and how students and professionals can build those strengths now.

The session includes real-world examples, guidance for education and careers, and a Q&A.

Speaker: Po-Shen Loh is a social entrepreneur and inventor, working across the spectrum of mathematics, education, and healthcare.

A math professor at Carnegie Mellon University, he also served a decade-long term as the national coach of the USA International Mathematical Olympiad (IMO) team, taking the team to gold on numerous occasions.

He has pioneered numerous innovations and has been featured in or co-created YouTube videos with more than 25 million views.

Location: Wang Center Theater

The series is offered by Stony Brook University's Institute for Creative Problem Solving in collaboration with the National Museum of Mathematics (MoMath) and Brookhaven National Laboratory.

The event is free but space is limited. Please register to reserve your space.

CSE 600 Talk: Securing Software-Defined Networking Infrastructure by Dr. Guofei Gu

ABSTRACT: Today's network and computing infrastructure rests on inadequate  foundations. An emerging, promising new foundation for computing is software-defined infrastructure (SDI), which offers a range of  
technologies including: compute, storage and network virtualization;  novel separation of concerns at the systems level; and new approaches to system and device management. As a representative example of SDI,  
software-defined networking (SDN) is a new networking paradigm that decouples the control logic from the closed and proprietary implementations of traditional network data plane infrastructure. SDN is now becoming the networking foundation for data-center/cloud, future Internet and 5G infrastructures.  

We believe that SDN is an impactful technology to drive a variety of innovations in network management and security. It is now clear that security will be a top concern, as well as a new killer app, for SDN. In this talk, I will discuss some new opportunities, as well as challenges, in this new direction and demonstrate with our recent  
research results. I will discuss how SDN can enhance network security. And I will also discuss some unique new security problems inside SDN and introduce some of our work to enhance the security of SDN. Finally, I will share my vision on programmable system security in a software-defined world.  

BIO: Dr. Guofei Gu is a professor in the Department of Computer Science & Engineering at Texas A&M University (TAMU). Before coming to Texas A&M, he received his PhD degree in Computer Science from the College  
of Computing, Georgia Institute of Technology. His research interests are in network and systems security.  
Dr. Gu is a recipient of 2010 NSF CAREER Award, 2013 AFOSR Young  Investigator Award, 2010 IEEE S&P Best Student Paper Award, 2015 ICDCS Best Paper Award, Texas A&M Dean of Engineering Excellence Award,  
Presidential Impact Fellow, Charles H. Barclay Jr. '45 Faculty Fellow and the Google Faculty Research Award. He is an active member of the security research community and has pioneered several new research directions such as botnet detection/defense and SDN security. Dr. Gu has served on the program committees of top-tier security conferences such as IEEE S&P, ACM CCS, USENIX Security and NDSS. He is an ACM Distinguished Member, an Associate Editor for IEEE Transactions on Information Forensics and Security (T-IFS), and the Steering Committee co-chair for SecureComm. He is currently directing the SUCCESS Lab at TAMU.
Abstract: Large language models (LLMs) may exhibit unintended or undesirable behaviors. Recent works have concentrated on aligning LLMs to mitigate harmful outputs. Despite these efforts, some anomalies indicate that even a well-conducted alignment process can be easily circumvented, whether intentionally or accidentally. Does alignment fine-tuning yield have robust effects on models, or are its impacts merely superficial? In this work, we make the first exploration of this phenomenon from both theoretical and empirical perspectives. Empirically, we demonstrate the elasticity of post-alignment models, i.e., the tendency to revert to the behavior distribution formed during the pre-training phase upon further fine-tuning. Leveraging compression theory, we formally deduce that fine-tuning disproportionately undermines alignment relative to pre-training, potentially by orders of magnitude. We validate the presence of elasticity through experiments on models of varying types and scales. Specifically, we find that model performance declines rapidly before reverting to the pre-training distribution, after which the rate of decline drops significantly. Furthermore, we further reveal that elasticity positively correlates with the increased model size and the expansion of pre-training data. Our findings underscore the need to address the inherent elasticity of LLMs to mitigate their resistance to alignment.

Speaker: Huajian Zhang

Location: CS2311
AI Institute Seminar Title: A Geometric Understanding of Deep Learning Abstract: This work introduces an optimal transportation (OT) view of generative adversarial networks (GANs). Natural datasets have intrinsic patterns, which can be summarized as the manifold distribution principle: the distribution of a class of data is close to a low-dimensional manifold. GANs mainly accomplish two tasks: manifold learning and probability distribution transformation. The latter can be carried out using the classical OT method. From the OT perspective, the generator computes the OT map, while the discriminator computes the Wasserstein distance between the generated data distribution and the real data distribution; both can be reduced to a convex geometric optimization process. Furthermore, OT theory discovers the intrinsic collaborative--instead of competitive--relation between the generator and the discriminator, and the fundamental reason for mode collapse. We also propose a novel generative model, which uses an autoencoder (AE) for manifold learning and OT map for probability distribution transformation. This AE-OT model improves the theoretical rigor and transparency, as well as the computational stability and efficiency; in particular, it eliminates the mode collapse. The experimental results validate our hypothesis, and demonstrate the advantages of our proposed model.