Abstract: Capturing the spatio-temporal (4D) dynamics of humans has been a long standing research problem in computer vision and graphics. Synthesizing photorealistic human avatars has broad applications, ranging from immersive telepresence in AR/VR and the movie industry, to enriching the education and healthcare systems. Earlier approaches relied on hand-engineered models that use a small amount of data from one or more subjects. With the advent of neural networks, training on large datasets enhanced the output visual quality. Currently, the combination of neural networks with graphics techniques has achieved natural-looking human animation. However, most approaches are identity-specific, trained only on a single identity, and use only one modality.
In this dissertation, we address the problem of learning neural representations of humans in a holistic way. Given that the video data in the real world include multiple modalities (e.g., audio and video) and multiple identities, we develop multi-modal and multi-identity representations. First, we propose to reconstruct the 4D face geometry of humans by leveraging both audio and video information. In this way, the network produces accurate lip shapes and is robust to cases when either modality is insufficient. Next, we introduce a NeRF-based representation for audio-driven human face animation that achieves high-quality lip synchronization for cinematic content. Since humans communicate with their full body, combining body pose, hand gestures, and facial expressions, we extend the network to capture full-body human motion for multiple identities simultaneously. In order to better disentangle identity and non-identity specific information, we subsequently study non-linear interactions between latent factors of variation, and propose a specific multiplicative module. In this way, we learn a multi-identity NeRF that robustly animates human faces under novel expressions and achieves a significant decrease in the total training time. Similarly, we propose a multi-identity Gaussian splatting representation for human bodies, by constructing a high-order tensor. Assuming a low-rank structure, we learn a tensor decomposition that leads to a significant decrease in the total number of learnable parameters, as well as to a robust animation under novel poses. Last but not least, we propose to jointly synthesize audio and visual outputs from just text input. Given the recent rise of large language models, coupling text with natural-looking avatars can enhance the overall interaction between a human and an AI system.
Location: NCS 220 or Zoom
Time: Jan 26, 2021 03:00 PM Eastern Time (US and Canada)
All are welcome!
Zoom Meeting:https://stonybrook.zoom.us/j/93818552212?pwd=ajZkT2x4a2tiaDJUL1h3VFhLZEgwQT09Meeting ID: 938 1855 2212
Passcode: 802722
Title: Data-Driven Document Unwarping
Abstract: Capturing document images is a common way to digitize and record physical documents due to the ubiquitousness of mobile cameras. To make text recognition easier, it is often desirable to digitally flatten a document image when the physical document sheet is folded or curved. However, unwarping a document from a single image in natural scenes is very challenging due to the complexity of document sheet deformation, document texture, and environmental conditions. Previous model-driven approaches struggle with inefficiency and limited generalizability. In this thesis, I investigate several data-driven approaches to tackle the document unwarping problem.
Data acquisition is the central challenge in data-driven methods. I first design an efficient data synthesis pipeline based on 2D image warping and train DocUNet, the pioneering data-driven document unwarping model, on the synthetic data. A benchmark dataset is also created to facilitate comprehensive evaluation and comparison. To improve the unwarping performance by training on more realistic data, I introduce the Doc3D dataset and DewarpNet. Supervised by 3D shape ground truth in Doc3D, DewarpNet is significantly better than DocUNet. DocUNet and DewarpNet depend on the synthetic data for the ground truth deformation annotation. To exploit the real-world images, I propose PaperEdge, a weakly supervised model trained with in-the-wild document images with easy-to-obtain boundary information. PaperEdge surpasses DewarpNet by utilizing both the synthetic data and weakly annotated real data in the Document In the Wild (DIW) dataset. Finally, I propose to incorporate the 3D physical constraints in training DewarpNet and PaperEdge. The constraints regulate the possible deformations on document papers. I also propose to augment the Doc3D and DIW dataset by introducing an online document segmentation model and better hardware.
This is Stony Brook's quantum moment. Join us for a spotlight on the core achievements and research excellence of faculty across the Colleges of Arts and Sciences (CAS), and Engineering and Applied Sciences (CEAS) - and their collaborative advancements in quantum science and technology. Learn about the real world impact of their enduring work, their leadership in translating foundational science into entrepreneurial opportunities, and their impetus for making connections to next generation innovation.
Presented by: Catherine Chen, Ph.D., Research Development Associate
Welcome remarks: President Andrea Goldsmith
Panel moderators: Dean David Wrobel, CAS, and Dean Andrew Singer, CEAS
Presentations and panel featuring our faculty:
Jennifer Cano, CAS, Physics and Astronomy
P. Scott Carney, CEAS, Mechanical Engineering
Hyeongrak Chuck Choi, CEAS, Electrical and Computer Engineering
Eden Figueroa, CAS, Physics and Astronomy
Humanshu Gupta, CEAS, Computer Science
Angela Kelly, CAS, Physics and Astronomy
Location: Theatre at the Charles B. Wang Center, Stony Brook University
Reserve your tickets by March 26!
You are cordially invited to attend the biweekly Brookhaven AI Mixer (BAM). BAM includes three short talks on AI research happening at BNL, followed by an open mixer over coffee and snacks for everyone to network and discuss all things AI. The first half hour will consist of presentations that will be available via ZOOM, and the second half hour will be for in person only networking.
Join us every other Tuesday at noon in CDSD's Training Room (building 725, 2nd floor) to learn about interesting AI methods and applications, engage with potential collaborators, prepare for pending FASST funding calls, and build a community of AI for Science at BNL.
Tuesday, December 10, 2024, 12:00 pm -- CDS, Bldg. 725, Training Room
Speakers
Esther Tsai, CFN
Yugang Zhang, CFN
Sanket Jantre, CDS
Join Zoom Meeting
https://bnl.zoomgov.com/j/1611764217?pwd=asNaXHDwGLnMr9hDv3L6zAcsQaN5FX.1
Meeting ID: 161 176 4217
Passcode: 855752
Abstract: Modern technologies enable enhanced integrity and privacy guarantees not just for data, but also for computation. This is perhaps most emphatically demonstrated by the steady rise of zero-knowledge proofs, which are short certificates that attest to the correctness of computations (e.g., an age verification check) without revealing any secret inputs (e.g., the birth date on a digital ID). This subtly powerful technology enables anonymous credentials, privacy-preserving machine learning, anonymous blockchains, and much more--making the question of efficient zero-knowledge proofs fundamental to modern secure systems. Echoing Moore's law for computing, zero-knowledge proofs have improved on this front by ten orders of magnitude in the last two decades. In this talk, I will discuss our work on overcoming a key bottleneck that has emerged in this development: memory efficiency.
Speaker: Abhiram Kothapalli is a postdoctoral scholar at the University of California, Berkeley, hosted by Sanjam Garg. He is a recent graduate of Carnegie Mellon University, where he earned his Ph.D. in Computer Science, advised by Bryan Parno. Previously, he was at the University of Illinois at Urbana-Champaign, where he earned his B.S. in Computer Science and B.S. in Mathematics. Kothapalli's research develops cryptographic techniques aimed at scaling expressive privacy and integrity guarantees across the internet.
Location: NCS 120
Are you concerned about AI issues with your asynchronous online courses? Is your fully online course vulnerable to AI plagiarism? Do you want to engage your online students using AI? Discover the future of education with our AI-powered solutions designed specifically for online asynchronous courses. This innovative approach uses artificial intelligence to transform the way courses are delivered, making learning more personalized, engaging, and effective.
Register here: https://stonybrook.zoom.us/meeting/register/RD94cHiHRwCj6xNkCZqNEg