Abstract:

Capturing the spatio-temporal (4D) dynamics of humans has been a long standing research problem in computer vision and graphics. Synthesizing photorealistic human avatars has broad applications, ranging from immersive telepresence in AR/VR and the movie industry, to enriching the education and healthcare systems. Earlier approaches relied on hand-engineered models that use a small amount of data from one or more subjects. With the advent of neural networks, training on large datasets enhanced the output visual quality. Currently, the combination of neural networks with graphics techniques has achieved natural-looking human animation. However, most approaches are identity-specific, trained only on a single identity, and use only one modality.

In this thesis, we address the problem of learning neural representations of humans in a holistic way. Given that the video data in the real world include multiple modalities (audio and video) and multiple identities, we develop multi-modal and multi-identity representations. First, we propose to reconstruct the 4D face geometry of humans by leveraging both audio and video information. In this way, the network produces accurate lip shapes and is robust to cases when either modality is insufficient. Next, we introduce a NeRF-based representation for audio-driven human face animation that achieves high-quality lip synchronization for cinematic content. Since humans communicate with their full body, combining body pose, hand gestures, and facial expressions, we extend our network to capture the full-body human motion for multiple identities simultaneously. In order to better disentangle identity and non-identity specific information, we subsequently study non-linear interactions between latent factors of variation, and propose a specific multiplicative module. In this way, we learn a multi-identity NeRF that robustly animates human faces under novel expressions and achieves a significant decrease in the total training time. Similarly, we propose a multi-identity gaussian splatting representation for human bodies, by constructing a high-order tensor. Assuming a low-rank structure, we learn a tensor decomposition that leads to a significant decrease in the total number of learnable parameters, as well as to a robust animation under novel poses. In the future, we propose to jointly synthesize audio and visual outputs from just text input. Given the recent rise of large language models, coupling text with natural-looking avatars can enhance the overall interaction between a human and an AI system.

Speaker: Aggelina Chatziagapi

Where: NCS, Room 220

Zoom link: https://stonybrook.zoom.us/j/98775312249?pwd=uORNAnSdcssrPZdqOsqaMAF5aLcRD9.1
ID: 98775312249
Passcode: 505777

Abstract:
Deep learning models have achieved remarkable success across a wide range of computer vision tasks, including image classification, semantic segmentation, etc. However, such success highly relies on a large amount of annotated data, which are expensive to obtain. Moreover, their performance often degrades when there exist distribution shifts between training and test data. Domain Adaptation overcomes these issues by transferring knowledge from a label-rich source domain to a related but different target domain. Despite its popularity, domain adaptation is still a challenging task, especially when the data distribution shifts are severe, while the target domain has no or few labeled data.

In this thesis, I develop four efficient domain adaptation approaches to improve model performance on the target domain. Firstly, inspired by the large-scale pretraining of Vision Transformers, I explore Transformer-based domain adaptation for stronger feature representation and design a safe training mechanism to avoid model collapse in the situation of a large domain gap. Secondly, I observe that source models have low confidences on the target data. To address this, I focus on the penultimate activations of target data and propose an adversarial training strategy to enhance model prediction confidences. Thirdly, I study using weak supervision from prior knowledge about target domain label distribution. A novel Knowledge-guided Unsupervised Domain Adaptation paradigm is devised, and a plug-in module is designed to rectify pseudo labels. Lastly, I step into the task of Active Domain Adaptation, where the labels of a small portion of target data can be inquired. I propose a novel active selection criterion based on the local context and devise a progressive augmentation module to better utilize queried target data. The robustness of domain adaptation approaches, in addition to accuracy, is critical yet under-explored. To conclude the thesis, I empirically study set prediction in domain adaptation using the tool of conformal prediction and conformal training.


Location: New Computer Science Bldg., Room 120
Zoom Link: https://stonybrook.zoom.us/j/92736258273?pwd=ipDdh1CTG6dRYmqa3ltUvooei8OfaT.1Meeting ID: 927 3625 8273
Passcode: 466399
Hidden Biases. Ethical Issues in NLP, and What to Do about Them presented by Dirk Hovy of Bocconi University

ABSTRACT: Through language, we fundamentally express who we are as humans. This property makes text a fantastic resource for research into the complexity of the human mind, from social sciences to humanities. However, it is exactly that property that also creates some ethical problems. Texts reflect the authors' biases, which get magnified by statistical models. This has unintended consequences for our analysis: If our data is not reflective of the population as a whole, if we do not pay attention to the biases contained, we can easily draw the wrong conclusions, and create disadvantages for our users.

In this talk, I will discuss several types of biases that affect NLP models, their sources, and potential counter measures: (1) Bias stemming from data, i.e., selection bias (if our texts do not adequately reflect the population we want to study), label bias (if the labels we use are skewed) and semantic bias (the latent stereotypes encoded in embeddings); (2) Biases deriving from the models themselves, i.e., their tendency to amplify any imbalances that are present in the data; (3) Design bias, i.e., the biases arising from our (the researchers) decisions which topics to analyze, which data sets to use, and what to do with them. For each bias, I will provide examples and discuss the possible ramifications for a wide range of applications, and various ways to address and counteract these biases, ranging from simple labeling considerations to new types of models.

BIO: Dirk Hovey is an associate professor of Computer Science in the department of marketing at Bocconi University. He received his PhD from the University of Southern California in Los Angeles, where he worked as a research assistant at the Information Sciences Institute. 

He works in Natural Language Processing (NLP), a subfield of artificial intelligence. His research focuses on computational social science. His interests include integrating sociolinguistic knowledge into NLP models, using large-scale statistics to model the interaction between people's socio-demographic profile and their language use, and ethics for data science and algorithmic fairness.
Virtual Job Fair for New Stony Brook Graduates & Experienced Alumni Using a platform called Career Fair Plus, participants will be able to schedule 10-minute video meetings with participating employers of interest to them. Recent graduates and alumni can register and learn more about how the fair will be run by registering on Handshake.

The Association for Computational Linguistics is the international scientific and professional society for people working on problems involving natural language and computation. Membership includes the ACL quarterly journals, Computational Linguistics and Transactions of the ACL, reduced registration at most ACL-sponsored conferences, discounts on ACL-sponsored publications, and participation in ACL Special Interest Groups.

An annual meeting is held each summer in locations where significant computational linguistics research is carried out.

For more information and registration, visit the official website.

TITLE: Sampling Using Langevin Diffusions Beyond the Worst-Case by Andrej Risteski of CMU


ABSTRACT: Many tasks involving generative models involve being able to sample from distributions parametrized as p(x) = e^{-f(x)}/Z where Z is the normalizing constant, for some function f whose values and gradients we can query. This mode of access to f is natural -- for instance sampling from posteriors in latent-variable models. Classical results show that a natural random walk, Langevin diffusion, mixes rapidly when f is convex. Unfortunately, even in simple examples, the applications listed above will entail working with functions f that are nonconvex.

We exhibit instances where Langevin diffusion (combined with other tools) can provably be shown to mix rapidly in instances of relevance in practice: distributions p that are multimodal, as well as distributions p that have a natural manifold structure on their level sets. 

Abstract: Implicit functions have long been a fundamental representation for both 2D and 3D objects in computer graphics, playing a significant role in the field's early development. With the rise of 3D deep learning and the rapid advancement of neural rendering techniques, implicit representations of 3D shapes have regained significant attention in recent years. In this talk, I will present several recent research projects focusing on implicit function-based 3D reconstruction and neural rendering. Furthermore, I will discuss potential future developments in this dynamic and rapidly evolving field.

Biography: Ying He is an Associate Professor at the College of Computing and Data Science, Nanyang Technological University, where he also serves as the Director of the Centre for Augmented and Virtual Reality. His research interests lie in geometric computation and analysis, with applications spanning computer graphics, 3D vision, computer-aided design, multimedia, and wireless sensor networks. Dr. He is an active member of the technical program committees for major conferences on geometric modeling and has served on the editorial boards of IEEE Transactions on Visualization and Computer Graphics, Computer Graphics Forum, and Computational Visual Media. He has also taken on key leadership roles as General/Program Co-Chair for several conferences, including Shape Modeling International (SMI) 2022, Solid and Physical Modeling (SPM) 2022 & 2023, Geometric Modeling and Processing (GMP) 2014 & 2021, and Computational Visual Media (CVM) 2020. For more information, please visit https://personal.ntu.edu.sg/yhe/

Location: NCS 115

Stony Brook University Libraries invites students, faculty, & staff to join a conversation about how AI is transforming the private sector workforce. As AI tools move from experimentation to everyday business use, companies are rethinking roles, skill sets, leadership, and long-term strategy. This discussion-based event will focus on the fast-paced changes and directions at tech companies and their possible impact. This event will be particularly relevant for students preparing for an AI influenced job market and how to position themselves for opportunities in a rapidly evolving professional landscape.

The discussion will be led by Tariq Khan, Senior Director of Private Cloud Solutions at Hewlett Packard Enterprise. Tariq is a technology leader and architect with experience across private cloud, hybrid cloud, and data center platforms. He is responsible for shaping the technology architecture and strategic direction of HPE's Private Cloud offerings across on premises and cloud integrated environments.

Light refreshments will be served.


Location: Melville Library, NRR, Learning Lab
How Language Makes us Smart (without Big Data) presented by Charles Yang

Abstract: Language provides the glue that combines simpler concepts into complex ones. To study how language guides conceptual development, we need precise accounts of how rules are learned from the child's linguistic experience, which is extremely limited in comparison to the amount of data available to current machine learning methods. In this talk, I discuss a mathematical model of inductive generalization, which enables language learning with very small amount of data. Such a view of learning has strong implications for the cross-cultural/linguistic variation of development. As a case study, I show that Hong Kong children learning Cantonese, which has a relatively simpler formal counting system, develop understanding of symbolic numbers a full year ahead of English-learning children in the United States, which is precisely predictable from the learning model. The new conception of learning adds another wrinkle to the eternal question of how language and thought are related to each other.

Bio: Charles Yang studied at the MIT AI lab and now teaches linguistics, computer science and psychology and directs the Program in Cognitive Science at the University of Pennsylvania. He is the author of several books: The Price of Linguistic Productivity (2016 MIT Press) won the Leonard Bloomfield Award from the Linguistic Society of America. His honors include a Guggenheim fellowship.