Time:
Sep 7, Tue, 11:00am EDT
Place:
NCS 220 or on Zoom (info below)
Title: Data-Driven Document Unwarping
Abstract:
Capturing document images is a common way to digitize and record physical documents due to the ubiquitousness of mobile cameras. To make text recognition easier, it is often desirable to digitally flatten a document image when the physical document sheet is folded or curved. However, unwarping a document from a single image in natural scenes is very challenging due to the complexity of document sheet deformation, document texture, and environmental conditions. Previous model-driven approaches struggle with inefficiency and limited generalizability. In this thesis, I investigate several data-driven approaches to tackle the document unwarping problem.
Data acquisition is the central challenge in data-driven methods. I first design an efficient data synthesis pipeline based on 2D image warping and train DocUNet, the pioneering data-driven document unwarping model, on the synthetic data. A benchmark dataset is also created to facilitate comprehensive evaluation and comparison. To improve the unwarping performance by training on more realistic data, I introduce the Doc3D dataset and DewarpNet. Supervised by 3D shape ground truth in Doc3D, DewarpNet is significantly better than DocUNet. DocUNet and DewarpNet depend on the synthetic data for the ground truth deformation annotation. To exploit the real-world images, I propose PaperEdge, a weakly supervised model trained with in-the-wild document images with easy-to-obtain boundary information. PaperEdge surpasses DewarpNet by utilizing both the synthetic data and weakly annotated real data in the Document In the Wild (DIW) dataset. Finally, I propose directly predicting the $uv$ parameterized 3D mesh of the document with 3D constraints and using the accessible 3D presentations like depth maps as training targets. Predicting the 3D mesh of the document solves the unwarping task and also benefits VR/AR applications.
Join Zoom Meeting
https://stonybrook.zoom.us/j/
Meeting ID: 964 4059 2912
Passcode: 793149
One tap mobile
+16468769923,,96440592912# US (New York)
+13017158592,,96440592912# US (Washington DC)
Dial by your location
+1 646 876 9923 US (New York)
+1 301 715 8592 US (Washington DC)
+1 312 626 6799 US (Chicago)
+1 253 215 8782 US (Tacoma)
+1 346 248 7799 US (Houston)
+1 408 638 0968 US (San Jose)
+1 669 900 6833 US (San Jose)
Meeting ID: 964 4059 2912
Find your local number: https://stonybrook.
Bio: Neil Gaikwad is an Assistant Professor of Data Science and Computer Science at UNC Chapel Hill. Additionally, he serves on the Faculty Advisory Council of the UNC Parr Center for Ethics and is a Fellow at the MIT Dalai Lama Center for Ethics and Transformative Values. Neil holds a Ph.D. in Society-Centered AI from MIT and is an alumnus of Carnegie Mellon University's School of Computer Science. Neil's scholarship, published in prominent AI and HCI conferences, has been recognized with several prestigious honors, including the Facebook Research Fellowship, UIST Best Paper Honorable Mention, MIT Engineering Fellowship, Human Rights & Technology Fellowship, Graduate Teaching Award, and the Karl Taylor Compton Prize, MIT's highest student honor. He has been recognized as a Rising Star by both Stanford University and the University of Chicago. Translating research into real-world impact, Neil is a dedicated educator and mentor who has taught over 500 students throughout his career. He has guided more than 30 students to publish influential papers on AI fairness, secure prestigious fellowships, and contribute to shaping AI policy through public interest research. Neil is also the founder of the AI Policy Global Initiative, which has successfully brought together academia, industry, government, and communities to address critical challenges in AI governance and develop collaborative approaches to responsible AI.
Location: Old Computer Science, room 1310
Abstract: The advent of ChatGPT has redrawn the boundary of pedagogical discourse, where the dyadic configuration of teacher-student has, for many, become triadic -- one that includes AI as an relevant third party, not to be missed or dismissed. Within applied linguistics, AI-focused research has predominantly targeted the teaching and learning of writing (Fang & Han, 2025). The work on AI and speaking, on the other hand, has largely involved perception studies documenting its positive impact on learners' willingness to communicate (Goh & Aryadoust, 2025). In this talk, I explore the role of AI in the teaching and learning of speaking, and in particular, the development of interactional competence. Based on a corpus of learner-AI interactions, I demonstrate the ways in which ChatGPT excels and fails at acting as a useful conversation partner, with a view towards furthering our ongoing deliberation on the affordances and constraints of AI in language education.
Speaker: Hansun Zhang Waring (Teachers College, Columbia University)
Hansun Zhang Waring is Professor of Linguistics and Education at Columbia University and founder The Language and Social Interaction Working Group (LANSI). As an applied linguist and a conversation analyst, Hansun is interested in all things interaction -- (second language) pedagogical interaction, communication with the public, parent-child interaction, and human-AI interaction (HAI). Her work has appeared in leading journals in applied linguistics and discourse analysis as well as numerous book volumes, some of which she (co-)authored or co-edited. She is on the editorial boards of Chinese Language and Discourse (CLD), Classroom Discourse (CD), and International Review of Applied Linguistics (IRAL).
Location: Wang Center, Lecture Hall #1
If you need special accommodation, please contact chikako.nakamura@stonybrook.edu.
In this talk, we'll explore an alternative paradigm for imaging: physically based neural representations for 3D scenes and 3D sensing systems. We will discuss how recent advances in large scale learned representations can be used to jointly optimize both 3D scene models and the design of sensing systems for 3D capture, with the goal of enabling task specific perception systems.
Unlike modern AI models trained on internet scale datasets, these specialized 3D representations typically operate in data sparse regimes and therefore require a different kind of prior. We'll examine how grounding these learned representations in the physics of light transport can improve our understanding of scene structure, and inform imaging system design even with limited data. By connecting physical insights with learned representations, we'll highlight new possibilities for robust, efficient, and adaptive perception in challenging environments.
Speaker: Nikhil Behari is a graduate student in the Camera Culture group at the MIT Media Lab, advised by Professor Ramesh Raskar. His research interests include computational imaging, 3D scene understanding, and multi-agent decision-making under uncertainty, with a focus on automating imaging system design for 3D perception in human and planetary health. His research is supported by the NASA Space Technology Graduate Research Fellowship. He received his bachelor's in Computer Science and Statistics from Harvard University in 2022.
Abstract: This talk shows how machine learning can address challenges in Astrophysics. We specifically focus on black hole simulations and supernova observations. First, we present a super-resolution technique for black hole simulations that avoids the need for high-resolution labels by leveraging the Hamiltonian and momentum constraints from general relativity. This method reduces constraint violations by one to two orders of magnitude. Next, we introduce Maven, a multimodal foundation model for supernova science. Using contrastive learning to align photometric and spectroscopic data, Maven achieves state-of-the-art results in classification and redshift estimation by pre-training on synthetic data and fine-tuning on real observations.
Bio: Thomas Helfer is a computational physicist specializing in deep learning and physics. Currently based at the Institute for Advanced Computational Science at Stony Brook University, Thomas was previously a postdoctoral fellow at Johns Hopkins and did his PhD with Eugene Lim at King's College in London. In his work, he looks to bridge topics; in his PhD, he bridged theoretical particle physics and gravitational waves. Now, in his postdoctoral work, he aims to find novel applications of deep learning in astrophysics.
*please note: this seminar will be held in a hybrid format*
Location: IACS Seminar Room OR Join Zoom Meeting
https://stonybrook.zoom.us/j/
Meeting ID: 986 1763 0652
Passcode: 882994
Join us to share your thoughts about teaching, learning, and AI!
The landscape of higher education is rapidly evolving with the integration of Artificial Intelligence (AI). Through the Institute on AI, Pedagogy, and the Curriculum with AAC&U, we are exploring ways that we can better address AI in teaching and learning. We want to hear your experiences, your concerns, and your ideas.
This is an open discussion for all faculty and staff to share their perspectives on the opportunities and challenges AI presents in our academic environment.
We'll be exploring critical questions like:
In the age of AI, what are the opportunities you see for enriching the classroom and curriculum? How can it enhance student learning or your professional practice?
What are the most significant challenges and concerns that AI raises for you regarding academics, student integrity, or your workload?
What resources (tools, training, technical support, policy guidance, etc.) do you need to feel confident and successful in the age of AI?
Dates/Times:
Tuesday, 2/3 at 2pm
Friday, 2/6 at 9:30am
Please register in advance for the Zoom link.
Can't Make It? Share Your Feedback!
We understand schedules are tight. If you cannot attend the live discussion, you can still share your thoughts! Join our AI Zoom Room to share your thoughts via video recording or email rose.tirotta-esposito@stonybrook.edu with your comments and ideas.
Videos will not be shared publicly and comments will only be shared in aggregate.
Your input is vital. From pedagogy to assessment, your insights will be critical. We look forward to a thoughtful and productive conversation!
Dr. Rose Tirotta-Esposito (Assistant Provost; Director of CELT)
Dr. Elizabeth Hewitt (Associate Professor in the Department of Technology and Society (DTS) in the College of Engineering and Applied Sciences)
Chris Kretz (Associate Librarian and Head of Academic Engagement at SBU Libraries)
Prof. Rajiv Lajmi (Assistant Professor in the School of Health Professions and Chair of Applied Health Informatics)
Dr. Matthew Salzano (Assistant Professor in the Department of Communication in the School of Communication and Journalism)
The Natural Language Processing Reading Group at Stony Brook University meets weekly to discuss recent research papers in NLP and related fields.
Join the Google Group here.
Abstract: Trustworthy AI deployment in high-stakes domains requires systems that are fair, private, robust, and controllable as they scale. Yet these demands are often pursued through ad-hoc approaches, lacking a systematic understanding of the inherent trade-offs between competing objectives. We add fairness regularizers and hope bias decreases. We train on massive datasets and hope the model learns the underlying logic of how concepts combine, rather than memorizing statistical shortcuts. We encrypt data and hope the resulting computational overhead remains manageable. But hope isnot a science.
In this talk, I argue that what trustworthy AI lacks is not better heuristics but a deeper science of what these properties fundamentally cost and what is achievable. Before we can fix a system, we must map the terrain: what trade-offs are unavoidable, what regions of performance areunreachable, and how far current methods fall from what is actually achievable. My research builds this map across fairness, privacy, robustness, and controllability, following a common methodology: diagnose where models fail, characterize the fundamental limits any method must obey, and design systems that approach those limits. I will present this framework, its extension to scientific applications where we replace statistical constraints with physical laws to ensure AI systems remain grounded in reality, and a vision for scaling these principles to the rapidly expanding ecosystem of composed and interacting AI systems.
Bio: Dr. Vishnu Boddeti is an Associate Professor in the Department of Computer Science and Engineering at Michigan State University, where he leads the Human Analysis Lab (HAL). His research develops mathematical frameworks for trustworthy AI, spanning fairness, privacy, robustness, and physics-informed learning, with an emphasis on characterizing fundamental limits and building systems that achieve them. His work has been supported by NSF, NIST, DARPA, ONR, Ford, and others, and recognized with a Meta Research Award (2021). His research has been featured on the cover of Nature, recognized as an Editor's Highlight in Nature Communications, and received multiple best paper awards, including the 2024 IEEE-CCF Cloud Computing Best Paper Award and the TMLR Outstanding Certification Finalist (2023). He serves as Senior Area Editor for IEEE Transactions on Information Forensics and Security and completed his PhD in ECE from Carnegie Mellon University in 2012.
Location: NCS 120