Time:
Sep 7, Tue, 11:00am EDT

Place:
NCS 220 or on Zoom (info below)

Title: Data-Driven Document Unwarping


Abstract:
Capturing document images is a common way to digitize and record physical documents due to the ubiquitousness of mobile cameras. To make text recognition easier, it is often desirable to digitally flatten a document image when the physical document sheet is folded or curved. However, unwarping a document from a single image in natural scenes is very challenging due to the complexity of document sheet deformation, document texture, and environmental conditions. Previous model-driven approaches struggle with inefficiency and limited generalizability. In this thesis, I investigate several data-driven approaches to tackle the document unwarping problem.

Data acquisition is the central challenge in data-driven methods. I first design an efficient data synthesis pipeline based on 2D image warping and train DocUNet, the pioneering data-driven document unwarping model, on the synthetic data. A benchmark dataset is also created to facilitate comprehensive evaluation and comparison. To improve the unwarping performance by training on more realistic data, I introduce the Doc3D dataset and DewarpNet. Supervised by 3D shape ground truth in Doc3D, DewarpNet is significantly better than DocUNet. DocUNet and DewarpNet depend on the synthetic data for the ground truth deformation annotation. To exploit the real-world images, I propose PaperEdge, a weakly supervised model trained with in-the-wild document images with easy-to-obtain boundary information. PaperEdge surpasses DewarpNet by utilizing both the synthetic data and weakly annotated real data in the Document In the Wild (DIW) dataset. Finally, I propose directly predicting the $uv$ parameterized 3D mesh of the document with 3D constraints and using the accessible 3D presentations like depth maps as training targets. Predicting the 3D mesh of the document solves the unwarping task and also benefits VR/AR applications.

Join Zoom Meeting
https://stonybrook.zoom.us/j/96440592912?pwd=ZU5waTdyUzRFNW5SRHM5ME84TWdFQT09

Meeting ID: 964 4059 2912
Passcode: 793149
One tap mobile
+16468769923,,96440592912# US (New York)
+13017158592,,96440592912# US (Washington DC)

Dial by your location
        +1 646 876 9923 US (New York)
        +1 301 715 8592 US (Washington DC)
        +1 312 626 6799 US (Chicago)
        +1 253 215 8782 US (Tacoma)
        +1 346 248 7799 US (Houston)
        +1 408 638 0968 US (San Jose)
        +1 669 900 6833 US (San Jose)
Meeting ID: 964 4059 2912
Find your local number: https://stonybrook.zoom.us/u/adxTt9ZbuJ
Abstract: As computing and society become increasingly inseparable, we confront a fundamental design challenge: creating AI systems where human-machine interactions authentically embody our diverse values while thoughtfully evolving our social relationships. The recursive nature of these interactions--where human behavior shapes technology design and technological affordances influence human behavior--presents both profound risks and transformative opportunities as we reimagine our collective digital future. What interaction patterns emerge when algorithmic systems become active participants in societal decision-making? How can we design human-AI collaboration that ensures algorithmic systems align with diverse community values while serving the public interest? Through Public Interest AI, we explore a Pluralistic Design Language that creates interaction models for value-sensitive algorithmic ecosystems, strengthening AI-society alignment in both technology design and policy development. Through collaborative interaction with communities, we create systems that augment human capabilities while embedding ethical principles into the sociotechnical design of AI itself--ultimately redefining possibilities at the intersection of technology, policy, and society. This talk will examine the challenges of designing meaningful human-AI systems within social contexts through real-world applications that combine value-sensitive interaction design, human-inspired computing, and societal development to create technologies that advance our shared commitment to the public good.

Bio: Neil Gaikwad is an Assistant Professor of Data Science and Computer Science at UNC Chapel Hill. Additionally, he serves on the Faculty Advisory Council of the UNC Parr Center for Ethics and is a Fellow at the MIT Dalai Lama Center for Ethics and Transformative Values. Neil holds a Ph.D. in Society-Centered AI from MIT and is an alumnus of Carnegie Mellon University's School of Computer Science. Neil's scholarship, published in prominent AI and HCI conferences, has been recognized with several prestigious honors, including the Facebook Research Fellowship, UIST Best Paper Honorable Mention, MIT Engineering Fellowship, Human Rights & Technology Fellowship, Graduate Teaching Award, and the Karl Taylor Compton Prize, MIT's highest student honor. He has been recognized as a Rising Star by both Stanford University and the University of Chicago. Translating research into real-world impact, Neil is a dedicated educator and mentor who has taught over 500 students throughout his career. He has guided more than 30 students to publish influential papers on AI fairness, secure prestigious fellowships, and contribute to shaping AI policy through public interest research. Neil is also the founder of the AI Policy Global Initiative, which has successfully brought together academia, industry, government, and communities to address critical challenges in AI governance and develop collaborative approaches to responsible AI.

Location: Old Computer Science, room 1310

Abstract: The advent of ChatGPT has redrawn the boundary of pedagogical discourse, where the dyadic configuration of teacher-student has, for many, become triadic -- one that includes AI as an relevant third party, not to be missed or dismissed. Within applied linguistics, AI-focused research has predominantly targeted the teaching and learning of writing (Fang & Han, 2025). The work on AI and speaking, on the other hand, has largely involved perception studies documenting its positive impact on learners' willingness to communicate (Goh & Aryadoust, 2025). In this talk, I explore the role of AI in the teaching and learning of speaking, and in particular, the development of interactional competence. Based on a corpus of learner-AI interactions, I demonstrate the ways in which ChatGPT excels and fails at acting as a useful conversation partner, with a view towards furthering our ongoing deliberation on the affordances and constraints of AI in language education.

Speaker: Hansun Zhang Waring (Teachers College, Columbia University)

Hansun Zhang Waring is Professor of Linguistics and Education at Columbia University and founder The Language and Social Interaction Working Group (LANSI). As an applied linguist and a conversation analyst, Hansun is interested in all things interaction -- (second language) pedagogical interaction, communication with the public, parent-child interaction, and human-AI interaction (HAI). Her work has appeared in leading journals in applied linguistics and discourse analysis as well as numerous book volumes, some of which she (co-)authored or co-edited. She is on the editorial boards of Chinese Language and Discourse (CLD), Classroom Discourse (CD), and International Review of Applied Linguistics (IRAL).

Location: Wang Center, Lecture Hall #1

If you need special accommodation, please contact chikako.nakamura@stonybrook.edu.

Abstract: Autonomous systems, whether on Earth or in space, rely on 3D perception to understand and interact with the world around them. Yet traditional techniques for 3D understanding often depend on human designed features, fixed sensors, and conventional imaging modalities. This constrained approach can limit every stage of perception, from sensing to interpretation to decision making.
In this talk, we'll explore an alternative paradigm for imaging: physically based neural representations for 3D scenes and 3D sensing systems. We will discuss how recent advances in large scale learned representations can be used to jointly optimize both 3D scene models and the design of sensing systems for 3D capture, with the goal of enabling task specific perception systems.
Unlike modern AI models trained on internet scale datasets, these specialized 3D representations typically operate in data sparse regimes and therefore require a different kind of prior. We'll examine how grounding these learned representations in the physics of light transport can improve our understanding of scene structure, and inform imaging system design even with limited data. By connecting physical insights with learned representations, we'll highlight new possibilities for robust, efficient, and adaptive perception in challenging environments.

Speaker: Nikhil Behari is a graduate student in the Camera Culture group at the MIT Media Lab, advised by Professor Ramesh Raskar. His research interests include computational imaging, 3D scene understanding, and multi-agent decision-making under uncertainty, with a focus on automating imaging system design for 3D perception in human and planetary health. His research is supported by the NASA Space Technology Graduate Research Fellowship. He received his bachelor's in Computer Science and Statistics from Harvard University in 2022.

Abstract: This talk shows how machine learning can address challenges in Astrophysics. We specifically focus on black hole simulations and supernova observations. First, we present a super-resolution technique for black hole simulations that avoids the need for high-resolution labels by leveraging the Hamiltonian and momentum constraints from general relativity. This method reduces constraint violations by one to two orders of magnitude. Next, we introduce Maven, a multimodal foundation model for supernova science. Using contrastive learning to align photometric and spectroscopic data, Maven achieves state-of-the-art results in classification and redshift estimation by pre-training on synthetic data and fine-tuning on real observations.

Bio: Thomas Helfer is a computational physicist specializing in deep learning and physics. Currently based at the Institute for Advanced Computational Science at Stony Brook University, Thomas was previously a postdoctoral fellow at Johns Hopkins and did his PhD with Eugene Lim at King's College in London. In his work, he looks to bridge topics; in his PhD, he bridged theoretical particle physics and gravitational waves. Now, in his postdoctoral work, he aims to find novel applications of deep learning in astrophysics.

*please note: this seminar will be held in a hybrid format*


Location: IACS Seminar Room OR Join Zoom Meeting
https://stonybrook.zoom.us/j/98617630652?pwd=tb4hplPgb3bTTifPCJTCcsn3P9vX8y.1

Meeting ID: 986 1763 0652
Passcode: 882994

Join us to share your thoughts about teaching, learning, and AI!

The landscape of higher education is rapidly evolving with the integration of Artificial Intelligence (AI). Through the Institute on AI, Pedagogy, and the Curriculum with AAC&U, we are exploring ways that we can better address AI in teaching and learning. We want to hear your experiences, your concerns, and your ideas.

This is an open discussion for all faculty and staff to share their perspectives on the opportunities and challenges AI presents in our academic environment.

We'll be exploring critical questions like:

  • In the age of AI, what are the opportunities you see for enriching the classroom and curriculum? How can it enhance student learning or your professional practice?

  • What are the most significant challenges and concerns that AI raises for you regarding academics, student integrity, or your workload?

  • What resources (tools, training, technical support, policy guidance, etc.) do you need to feel confident and successful in the age of AI?

Dates/Times:

  • Tuesday, 2/3 at 2pm

  • Friday, 2/6 at 9:30am

Please register in advance for the Zoom link.

Can't Make It? Share Your Feedback!

We understand schedules are tight. If you cannot attend the live discussion, you can still share your thoughts! Join our AI Zoom Room to share your thoughts via video recording or email rose.tirotta-esposito@stonybrook.edu with your comments and ideas.

Videos will not be shared publicly and comments will only be shared in aggregate.

Your input is vital. From pedagogy to assessment, your insights will be critical. We look forward to a thoughtful and productive conversation!

  • Dr. Rose Tirotta-Esposito (Assistant Provost; Director of CELT)

  • Dr. Elizabeth Hewitt (Associate Professor in the Department of Technology and Society (DTS) in the College of Engineering and Applied Sciences)

  • Chris Kretz (Associate Librarian and Head of Academic Engagement at SBU Libraries)

  • Prof. Rajiv Lajmi (Assistant Professor in the School of Health Professions and Chair of Applied Health Informatics)

  • Dr. Matthew Salzano (Assistant Professor in the Department of Communication in the School of Communication and Journalism)

The overall purpose of this seminar is to bring together people with interests in Computer Vision theory and techniques and to examine current research issues. This course will be appropriate for people who already took a Computer Vision graduate course or already had research experience in Computer Vision. To enroll in this course, you must either: (1) be in the PhD program or (2) receive permission from the instructors. Each seminar will consist of multiple short talks (around 15 minutes) by multiple students. Students can register for 1 credit for CSE656. Registered students must attend and present a minimum of 2 talks. Everyone else is welcome to attend. Fill in https://forms.gle/q6UG9ygauLp2a8Po8 to subscribe to our mailing list for further announcement.



Abstract: Trustworthy AI deployment in high-stakes domains requires systems that are fair, private, robust, and controllable as they scale. Yet these demands are often pursued through ad-hoc approaches, lacking a systematic understanding of the inherent trade-offs between competing objectives. We add fairness regularizers and hope bias decreases. We train on massive datasets and hope the model learns the underlying logic of how concepts combine, rather than memorizing statistical shortcuts. We encrypt data and hope the resulting computational overhead remains manageable. But hope isnot a science.
In this talk, I argue that what trustworthy AI lacks is not better heuristics but a deeper science of what these properties fundamentally cost and what is achievable. Before we can fix a system, we must map the terrain: what trade-offs are unavoidable, what regions of performance areunreachable, and how far current methods fall from what is actually achievable. My research builds this map across fairness, privacy, robustness, and controllability, following a common methodology: diagnose where models fail, characterize the fundamental limits any method must obey, and design systems that approach those limits. I will present this framework, its extension to scientific applications where we replace statistical constraints with physical laws to ensure AI systems remain grounded in reality, and a vision for scaling these principles to the rapidly expanding ecosystem of composed and interacting AI systems.


Bio: Dr. Vishnu Boddeti is an Associate Professor in the Department of Computer Science and Engineering at Michigan State University, where he leads the Human Analysis Lab (HAL). His research develops mathematical frameworks for trustworthy AI, spanning fairness, privacy, robustness, and physics-informed learning, with an emphasis on characterizing fundamental limits and building systems that achieve them. His work has been supported by NSF, NIST, DARPA, ONR, Ford, and others, and recognized with a Meta Research Award (2021). His research has been featured on the cover of Nature, recognized as an Editor's Highlight in Nature Communications, and received multiple best paper awards, including the 2024 IEEE-CCF Cloud Computing Best Paper Award and the TMLR Outstanding Certification Finalist (2023). He serves as Senior Area Editor for IEEE Transactions on Information Forensics and Security and completed his PhD in ECE from Carnegie Mellon University in 2012.

Location: NCS 120