Le Hou Dissertation Defense: Deep Learning for Digital Histopathology across Multiple Scales

ABSTRACT: Histopathology is the study of tissue changes caused by diseases such as cancer. It plays a crucial role in disease diagnosis, survival analysis and development of new treatments. Using computer vision techniques, I focus on multiple tasks for automated analysis in digital histopathology images, which are challenging because histopathology images are heterogeneous and complex, due to the large variation of hundreds of cancer types in gigapixel resolution. In this thesis, I show how histopathology image analysis tasks can be viewed in three scales: Whole Slide Image (WSI)-level, patch-level and cellular-level, and present my contributions in each resolution level.

BIO: WSI-level analysis such as classifying WSIs into cancer types is challenging, because conventional classification methods such as off-the-shelf deep learning models cannot be applied directly on gigapixel WSIs due to computational limitations. I contribute a patch-based deep learning method that classifies gigapixel WSIs into cancer types and subtypes with close-to-human performance. This method is useful for computer-aided diagnosis. At patch-level, I contribute a novel method for histopathology image patch classification. On the task of identifying Tumor Infiltrating Lymphocyte (TIL) regions, the prediction result of this method correlates to the survival rate of patients. At cellular-level, I contribute novel methods for nucleus classification and roundness regression, which are interpretable features for histopathology studies. With this method, I generated a large-scale dataset of segmented nuclei, in WSIs from a large publicly available digital histopathology image dataset, to help advance histopathology research.

Zoom Link: https://stonybrook.zoom.us/j/98533029054?pwd=5FXO6lWGTJssCADEYkYbA7sjaacPRX.1

Meeting ID: 985 3302 9054

Passcode: 436997

Abstract:

Semantic segmentation, the task of assigning a semantic label to each pixel in an image, is a fundamental problem in the field of Computer Vision. with crucial applications in domains like autonomous driving, drone imagery and medical image analysis. Despite advancements in deep learning architectures, state-of-the-art models still heavily depend on large-scale pixel-level annotations, which are costly and time-consuming to acquire. To address this issue, Semi-Supervised Segmentation (SSS) has emerged as a promising solution, leveraging a small set of labeled images alongside a larger corpus of unlabeled data to reduce the annotation burden. In this proposal, I aim to investigate the challenges of SSS and propose approaches to address them. Existing SSS methods rely on a teacher-student framework to generate pseudo-labels for unlabeled images, which are then used for model training. However, this approach presents two major challenges. Pixel-level consistency fails to effectively capture contextual information, and pseudo-labels are noisy, especially in the early stages of training. To address the challenge of noisy pseudo-labels, existing methods rely on confidence-based thresholding to identify reliable pseudo-labels. However, during early training phases, when the model is poorly calibrated, this approach can select high-confidence but noisy pseudo-labels. To address this, we propose a novel approach that reduces reliance on model confidence to select reliable pseudo-labels. Our method employs an ensemble of a segmentation model and an object detection model to select more reliable pseudo-labels, which are then used to weight pseudo-labels using rank statistics, reducing the influence of noisy labels in training. Next, to address both the challenge of capturing contextual information and noisy pseudo-labels I introduce a novel Multi-scale Patch-based Multi-label Classifier (MPMC), which incorporates patch-level contextual information and reduces the impact of noisy pixel pseudo-labels by using the predictions of the patch-level Multi-label classifier to detect noisy labels, enhancing overall segmentation performance. While my work so far has focused on effectively utilizing unlabeled data to improve segmentation performance, as part of our future work, I will explore the use of textual information, such as category descriptions, for segmentation tasks. In limited labeled data scenarios it is more challenging to align visual features with textual features from large language models (LLMs).

Climate Uncertainty, Decision Making, and AI for Earth System Predictability Dr. Nathan Urban, Brookhaven National Laboratory

Bio: Nathan Urban is the group leader of the Optimal Experimental Design & Uncertainty Quantification group in the Applied Mathematics Department at Brookhaven National Laboratory's Computing & Data Sciences directorate (CDS). He holds a Ph.D. in theoretical condensed matter physics from Penn State, and has previously held research positions at Los Alamos National Laboratory, Princeton, and Penn State. His research interests include Bayesian inference and spatiotemporal statistics, probabilistic prediction and forecasting, multi-model / model-form / model structural uncertainty quantification, reduced order modeling, scientific machine learning and hybrid physical-data driven modeling, in-situ/streaming data analysis at scale, information fusion, decision making under uncertainty and optimal experimental design, and integrated multiscale computational frameworks for decision support.

Location: IACS Seminar Room

Lunch will be provided
Abstract: Recent progress in large language and vision models demonstrates how far we can go by scaling with vast internet-scale data. In contrast, physical AI, agents that perceive and act in the real world, still lags far behind. Today, both academia and industry primarily pursue generalizable physical AI by scaling up: collecting large-scale action-video datasets or training world models that enable interaction through learned environments. However, this paradigm is inherently inefficient and will soon reach a data ceiling. In this talk, I argue for a shift from scaling up to scaling out. I introduce reality world simulators, a new paradigm that converts real-world videos into diverse, interactive simulation environments. Instead of relying on more data collection, this approach expands data through structured reconstruction and recomposition, enabling both higher data efficiency and physically grounded interaction. I will present a three-pronged approach: 1) Scaling out via Digital Twins: reconstructing controllable, interactive environments from monocular videos to support diverse agent exploration. 2) Scaling out via Digital Cousins: disentangling scene structure into compositional elements to generate large-scale variations of real-world environments. 3) Scaling out via Embodied Humans: incorporating realistic human dynamics to improve safety and social compliance in robot learning. Finally, I will outline a roadmap toward building generalizable and safe physical AI systems for open-world deployment.

Bio: Dr. Wayne Wu is a postdoctoral researcher at UCLA Computer Science, working closely with Bolei Zhou, and collaborating with Trevor Darrell (UC Berkeley EECS) and Jiaqi Ma (UCLA CEE). He received his Ph.D. in Computer Science and Technology from Tsinghua University in June 2022 and was previously a visiting Ph.D. student at Nanyang Technological University. He also spent seven years in industry, where he led the research and development of products that reached more than 10 million end users worldwide. His research lies at the intersection of computer vision, robotics, and computer graphics. He focuses on developing infrastructure and methods to scale physical AI, enabling robots to work reliably and safely in the open world. He has published over 50 papers at top-tier venues including CVPR, ICCV, ICLR, NeurIPS, and ICRA, with over 9,500 citations and 10,000 GitHub stars. His work has received a CVPR Best Paper Candidate and multiple Oral, Spotlight, and Highlight presentations. He was also honored with the 2025 UCLA Chancellor's Award for Postdoctoral Research, recognizing the best postdocs at UCLA, and he was the only awardee from the School of Engineering. He serves as an Area Chair at CVPR 2026.

Location: NCS 120
Do Natural Language Understanding Systems Learn to Understand or to
Find Shortcuts? (Naoya Inoue, http://naoya-i.github.io/)

ABSTRACT: Recent studies have suggested that natural language understanding (NLU) systems learn to exploit superficial, task-unrelated cues (a.k.a. annotation artifacts) in current datasets. This prevents the community from reliably measuring the progress of NLU systems. In this talk, I will discuss two latest studies from our research team: (i) analysis of annotation artifacts in commonsense causal reasoning and (ii) creation of benchmark for evaluating NLU systems' internal reasoning.
---------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------
Learning graph-structured sparse models (Baojian Zhou, https://baojianzhou.github.io/

ABSTRACT: Learning graph-structured sparse models has recently received significant attention thanks to their broad applicability to many important real-world problems. However, such models, of more effective and stronger interpretability compared with their counterparts, are difficult to learn due to optimization challenges. In this talk, we will discuss how to learn graph-structured sparse models under stochastic and online learning settings. Some interesting related problems will also be discussed.


Abstract: In high-dimensional data spaces, vast empty regions often exist where no known data points are present. These empty spaces are not merely gaps but hold untapped potential for discovering novel configurations, optimizing parameters, and improving decision-making processes. However, traditional exploration techniques struggle to identify and leverage these regions due to the curse of dimensionality. To address this, we introduce the Empty Space Search Algorithm (ESA), a scalable, physics-inspired method that systematically identifies and explores these uncharted voids. ESA operates by modeling the data space as a dynamic system, using a repulsion-attraction mechanism to locate optimal empty space configurations (ESCs) without requiring exhaustive search. Building upon ESA, we present GapMiner, a visual analytics system that integrates human-in-the-loop AI to iteratively refine and validate ESCs. GapMiner combines parallel coordinate visualization, interactive optimization, and deep learning-based predictive modeling to enhance the efficiency of empty space exploration. This methodology has broad applications, including accelerating convergence in evolutionary algorithms through a more diverse initial population, optimizing adversarial learning strategies, and discovering novel parameter configurations in reinforcement learning. Our approach demonstrates that empty space is not just an absence of data but a frontier for new possibilities in high-dimensional problem-solving.
Bio: Xinyu Zhang received his B.E. in Computer Science from Shandong University, Taishan College, in 2019. He is currently a final-year Ph.D. candidate in the Department of Computer Science at Stony Brook University, advised by Prof. Klaus Mueller. His research focuses on multivariate data analysis, scientific visualization, and reinforcement learning. He has published multiple papers in top-tier journals and conferences, including IEEE TVCG and NeurIPS.
*this seminar will be held in person (food provided on a first come, first serve basis), and online (zoom link below)!
Topic: IACS Student Seminar Speaker: Xinyu Zhang
Time: Feb 26, 2025 12:00 PM Eastern Time (US and Canada)
Join Zoom Meeting
https://stonybrook.zoom.us/j/91848218975?pwd=lfITFa61GaXZ2Wsa1B1OnbLQMmXvOE.1

Meeting ID: 918 4821 8975
Passcode: 027337
Title: Sustainable NLP

Time: Friday 4/29, 2:40 PM

Location: NCS 120

Abstract:


Natural language processing (NLP) technology has supercharged many real-world applications ranging from intelligent personal assistants (like Alexa, Siri, and Google Assistant) to commercial search engines such as Google and Bing. But current NLP applications use extremely large neural models, making them (i) expensive to deploy on servers, requiring large amounts of compute resources and power, and (ii) impossible to run on mobile devices, making on-device, privacy-preserving applications impractical.

In the first part of the talk, I will describe systems optimizations we have developed that significantly reduce the compute and memory requirement of NLP models. The optimizations we developed can be applied broadly and results in over 10x reduction in latency when deployed on mobile devices. In the second part of the talk, I will describe our recent work on predicting energy consumption of NLP models. Existing energy prediction approaches are not accurate, making it difficult for developers and practitioners to reason about their models in terms of power. We use a multi-level regression approach that produces highly accurate and interpretable energy predictions.



Bio:
Aruna Balasubramanian is an Associate Professor at Stony Brook University. She received her Ph.D from the University of Massachusetts Amherst, where her dissertation won the UMass outstanding dissertation award and was the SIGCOMM dissertation award runner up. She works in the area of networked systems. Her current work consists of two threads: (1) significantly improving Quality of Experience of Internet applications, and (2) improving the usability, accessibility, and privacy of mobile systems. She is the recipient of the SIGMobile Rockstar award, a Ubicomp best paper award, a Computing Innovation Fellowship, a VMWare Early Career award, several Google research awards, an
Spring 2025, Mondays 3.30 to 4.50 pm, NCS 220.

The seminar will be jointly taught by Prof. Chao Chen, chao.chen.1@stonybrook.edu and Prof. Dimitris Samaras samaras@cs.stonybrook.edu

The overall purpose of this seminar is to bring together people with interests
in Computer Vision theory and techniques and to examine current research issues. This course will be appropriate for people who already took a Computer Vision graduate course or already had research experience in Computer Vision.

To enroll in this course, you must either: (1) be in the Ph.D. program or (2) receive permission from the instructors.

Each seminar will consist of multiple short talks (around 15 minutes) by multiple students. Students can register for 1 credit for CSE656. Registered students must attend and present a minimum of 2 talks. Registered students must attend in person. Up to 3 absences will be excused. Everyone else is welcome to attend.

Join here. Meeting ID: 927 2069 8658. Passcode: 130934.
.