Predicting Subjective Attributes in Visual Data - Zijun Wei

ABSTRACT: Recent progress in deep neural networks has revolutionized many computer vision tasks such as image classification, detection and segmentation. However, in addition to excelling in tasks that predict well-defined objective information, human-centered artificial intelligence systems should also be able to model subjective attributes, as defined by human perceptual behavior, that goes beyond the pure physical content of visual data. Example subjective tasks are the prediction of spatial or temporal regions that are interesting to humans (e.g., attract attention or are visually pleasing) and the recognition of subjective attributes (e.g., visually elicited sentiments). Better models for these tasks will improve the human-computer interaction experience in various applications. This thesis investigates several approaches to address the challenges in predicting those subjective attributes in visual data over a diverse set of tasks. I first present a novel framework for real-time automatic photo composition. The framework consists of a cost-effective data collection workflow, an efficient model training pipeline and a lightweight module to account for personalized preferences. Then I develop a novel and general algorithm to detect interesting segments in sequential data, which can be naturally applied to video summarization tasks. Furthermore, I propose methods that learn to represent sentiments elicited by images, in an unsupervised manner, using linguistic features extracted from large scale Web data. To conclude this thesis, I introduce a human-vision-inspired image classification algorithm that also predicts spatial visual attention even though no attention data was used for training it.  

You are cordially invited to attend the biweekly Brookhaven AI Mixer (BAM). BAM includes one short talk on AI research happening at BNL, followed by an open mixer over coffee and snacks for everyone to network and discuss all things AI. The first half hour will consist of presentations that will be available via ZOOM, and the second half hour will be for in person only networking.

Join us every other Tuesday at noon in CDSD's Training Room (building 725, 2nd floor) to learn about interesting AI methods and applications, engage with potential collaborators, prepare for pending FASST funding calls, and build a community of AI for Science at BNL.

HPCortex - a new, general-purpose machine learning library for HPC

Abstract: I will introduce HPCortex, a lightweight, C++, MPI-native machine-learning library for heterogeneous HPC systems. It implements many common architecture patterns including transformers, graph neural networks, and convolutional networks, and delivers performance portability across NVIDIA, AMD, and Intel GPUs while depending only on MPI and standard compiler/BLAS stacks. I will illustrate its capabilities via a surrogate model for the RHIC AGS Booster digital twin, a simple GNN for a coupled spring system, and a compact language model, then outline the roadmap.

Biography: Christopher is a research scientist and head of the Scientific Computing Applications Group in the Computational Science Department at Brookhaven National Laboratory. Previously he was an assistant staff scientist in the Physics Dept. at Columbia University, and held physics postdoctoral research positions at both Brookhaven and Columbia. He earned his Ph.D in Theoretical Physics from the University of Edinburgh, UK.
His scientific background is in lattice QCD and high performance computing, but since joining Brookhaven in 2020 his research interests have expanded to include machine learning, applied mathematics and performance analysis, with a particular emphasis on building tools to support scientific research on HPC systems.

Location: CDS, Bldg. 725, Training Room

Join ZoomGov Meeting: https://bnl.zoomgov.com/j/1604143373?pwd=hHT2yaIjahBIQ6tieURFqs8Pwex9gU.1

Meeting ID: 160 414 3373
Passcode: 277410

Face Editing with Machine Learning presented by Zhixin Shu

ABSTRACT: The face is the most informative feature of humans and has been a long-standing research topic in Computer Vision and Graphics. Images of faces are also ubiquitous in photography and social media, and people have devoted significant resources to capturing and editing face images. Face editing can be broadly viewed as the encoding, manipulation and the decoding of some representations for face images. The challenges are that we want to manipulate an image in a controllable way and generate results that are both desirable and as realistic as possible. This thesis explores different Machine Learning-based face-editing approaches. I discuss the role of machine learning for achieving desirable edits by learning both the physical aspects as well as the statistical manifold of human faces. In my work for eye-editing, I discuss the importance of understanding multiple physical elements of a face image, such as shape, illumination, pose, etc. In a deep-learning-based approach, I introduce image formation domain knowledge to the construction and training of a neural network. This network provides transparent access to the disentangled representations of the aforementioned physical properties. With this network, we can achieve various face editing tasks in forms of representation manipulation. After that, I introduce Deforming Autoencoders, a network that learns to disentangle shape and appearance in an unsupervised manner. This disentanglement is beneficial for the learning of some other factors of variations, such as illumination and facial expression. In an extension of Deforming Autoencoders, we incorporate non-rigid structure-from-motion to learn a 3D morphable model for faces that only requires an image set for training. At last, I describe an image-to-image network for 3D face reconstruction, which also utilizes structure-from-motion in deep learning. With real face images in training, this network not only reconstructs 3D faces more accurately than prior art but also has better generalization ability in real-life testing cases.

You are cordially invited to attend the biweekly Brookhaven AI Mixer (BAM). BAM includes three short talks on AI research happening at BNL, followed by an open mixer over coffee and snacks for everyone to network and discuss all things AI. The first half hour will consist of presentations that will be available via ZOOM, and the second half hour will be for in person only networking.

Join us every other Tuesday at noon in CDSD's Training Room (building 725, 2nd floor) to learn about interesting AI methods and applications, engage with potential collaborators, prepare for pending FASST funding calls, and build a community of AI for Science at BNL.

Tuesday, January 7, 2025, 12:00 pm -- CDS, Bldg. 725, Training Room


Speakers

Sanket Jantre
Tao Zhang
Xi Yu


Join ZoomGov Meeting: https://bnl.zoomgov.com/j/1615289117?pwd=Hqkbj9itxWrFnkhZ8rQXHPInO2gxdF.1

Meeting ID: 161 528 9117
Passcode: 991382



Abstract:
Large language models (LLMs) have transformed the way humans write code, bringing unprecedented automation to software development. In this talk, I will first provide an overview of my research on enhancing LLMs' code intelligence, optimizing each step of the development pipeline towards more complex software engineering tasks. I will then delve into my key contributions, focusing on how to equip LLMs with a deeper, more comprehensive understanding of software programs. Finally, I will discuss the future of AI-driven software engineering, envisioning a new era of automation that is more reliable, intelligent, and cost-efficient.

Bio:
Yangruibo (Robin) Ding is a Ph.D. candidate in the Department of Computer Science at Columbia University. His research is at the intersection of Software Engineering and Machine Learning, focusing on developing large language models (LLMs) for code. He trains LLMs to generate, analyze, and refine software programs and constructs benchmarks to systematically evaluate LLMs in solving software engineering tasks. He also studies how to improve LLMs' reasoning capability to tackle complex programming tasks, such as debugging and patching. His interdisciplinary research has been published in top-tier conferences of software engineering, programming languages, natural language processing, and machine learning. He won an ACM SIGSOFT Distinguished Paper Award, an IEEE TSE Best Paper Runner-up, and received an IBM Ph.D. Fellowship.
Location:
NCS 120
TITLE: Towards a Theory of Encode/Decoder Architectures by Andrej Risteski of CMU

ABSTRACT: A common choice of architecture in representation learning (i.e., learning a good embedding of the data) is an encoder/decoder architecture, which tries to map a part of the input into a good latent representation (via an encoder), and predict the remaining part of the input (via a decoder). Two common examples are universal machine translation: where one tries to learn to translate between any pair of a set of languages via a common latent language, given paired up corpora for only a part of the pairs; and contextual encoders -- where one tries to predict a part of the image, given the rest of the image.
 
We will give a framework for analyzing the sample complexity of such architectures -- i.e., how many pairs of languages do we need to have paired up corpora for? How many image prediction tasks do we have to solve to get a good representation?