Dates: 

Wednesday, March 3, 2021 - 6:00pm to 7:30pm

Location: 

Zoom - contact events@cs.stonybrook.edu for Zoom info.

Event Description: 

Women in Computer Science (WiCS), the Society of Women Engineers (SWE), and the Stony Brook Robotics Team (SBRT) are collaborating to host an event called Inspiring Women in STEM Academia: A Community Dialogue to address the lack of female representation in STEM academia. 
 

All are invited to attend so they may gain a better understanding of the challenges faced by their female colleagues and hear perspectives on how they can offer support in the workplace. Given the shockingly disproportionate number of female professionals in STEM academia, we feel that this event would be extremely beneficial for male faculty to listen to and amplify their voices.

It will begin with a discussion panel consisting of Stony Brook professors and faculty who will provide valuable insight into the issue. From there, we will split into smaller discussion groups where student and faculty attendees will be able to voice their opinions, hear about the thoughts/experiences of others, and participate in an engaging discussion with panelists.

The event will be held on March 3rd from 6:00 - 7:30 PM on Zoom.
 

The following Stony Brook faculty will be panelists:

Dr. Aruna Balasubramanian - Computer Science Professor, WiCS Advisor, WPhD Advisor

Dr. Xinwei Mao - Civil Engineering Assistant Professor

Urszula Zalewski - Director of Experiential Learning, Career Center Advisor (Healthcare)

Dr. Heather Lynch - Ecology and Evolution Professor, Lynch Lab for Quantitative Ecology

Karen Kernan - URECA Director, Simons Summer Research Program Director

Dr. Eszter Boros - Chemistry Assistant Professor, Boros Lab

Dr. Maria Nagan - Chemistry Lecturer, Nagan Research Lab


Time: Jan 26, 2021 03:00 PM Eastern Time (US and Canada)

All are welcome!

Zoom Meeting:
https://stonybrook.zoom.us/j/93818552212?pwd=ajZkT2x4a2tiaDJUL1h3VFhLZEgwQT09

Meeting ID: 938 1855 2212
Passcode: 802722

Title: Data-Driven Document Unwarping

Abstract: Capturing document images is a common way to digitize and record physical documents due to the ubiquitousness of mobile cameras. To make text recognition easier, it is often desirable to digitally flatten a document image when the physical document sheet is folded or curved. However, unwarping a document from a single image in natural scenes is very challenging due to the complexity of document sheet deformation, document texture, and environmental conditions. Previous model-driven approaches struggle with inefficiency and limited generalizability. In this thesis, I investigate several data-driven approaches to tackle the document unwarping problem.

Data acquisition is the central challenge in data-driven methods. I first design an efficient data synthesis pipeline based on 2D image warping and train DocUNet, the pioneering data-driven document unwarping model, on the synthetic data. A benchmark dataset is also created to facilitate comprehensive evaluation and comparison. To improve the unwarping performance by training on more realistic data, I introduce the Doc3D dataset and DewarpNet. Supervised by 3D shape ground truth in Doc3D, DewarpNet is significantly better than DocUNet. DocUNet and DewarpNet depend on the synthetic data for the ground truth deformation annotation. To exploit the real-world images, I propose PaperEdge, a weakly supervised model trained with in-the-wild document images with easy-to-obtain boundary information. PaperEdge surpasses DewarpNet by utilizing both the synthetic data and weakly annotated real data in the Document In the Wild (DIW) dataset. Finally, I propose to incorporate the 3D physical constraints in training DewarpNet and PaperEdge. The constraints regulate the possible deformations on document papers. I also propose to augment the Doc3D and DIW dataset by introducing an online document segmentation model and better hardware.
Abstract: In today's digital era, language functions not only as a medium of information transmission but also as a mechanism of persuasion, framing, and control. The proliferation of online platforms has amplified this dual role: while enabling unprecedented access to knowledge, it has also exacerbated challenges such as misinformation, rhetorical manipulation, and cultural or linguistic disparities in information access. As a result, pragmatic language understanding and information integrity have emerged as central concerns for both computational linguistics and society at large. This research follows how claims are produced, reframed, and contested online through three interconnected threads. First, it models pragmatic deflection in discourse by investigating whataboutism, a rhetorical device that deflects criticism by redirecting discourse, and introduced novel datasets from Twitter (now X) and YouTube. This work underscores how subtle pragmatic maneuvers can erode discourse integrity without relying on outright falsehoods. Second, it advances retrieval and alignment for information integrity in health and news communication. These systems trace claims and narratives across genres (e.g., social posts and news reports) and languages (Chinese and English), linking social posts with journalistic reporting and aligning Chinese news with English biomedical evidence. By accounting for cultural context, assertions can be linked to reliable evidence and organized for systematic comparison. This work surfaces the risks of missing sources, unverifiable claims, and framing disparities in global health discourse, and demonstrates computational solutions that enhance both the credibility and accessibility of information. Third, the methodological centerpiece is Class Distillation (ClaD), a geometry-aware training paradigm for distilling a small, well-defined target class from a large, heterogeneous background. ClaD couples a distribution-aware contrastive loss (instantiated here in a Mahalanobis form when its assumptions fit the data) with an interpretable decision algorithm tuned for class separation. Evaluated on sarcasm, metaphor, and sexism detection, ClaD delivers strong efficiency and robustness, matching or surpassing larger models while using fewer computational resources, making these pipelines practical by learning reliably from small, sharply defined classes. In sum, this research presents an integrated account of language understanding in the digital age. It exposes how integrity falters through pragmatic deflection, cross-genre drift, and cross-lingual misalignment, and translates these insights to move pragmatic language understanding to systems for evidence retrieval, alignment, and verification; and it sheds light on where and how integrity is threatened, and delivers methods that leverage pragmatic language use.

Speaker: Chenlu Wang

Location: (Old) Computer Science Building, Room 2311
Abstract: Artificial Intelligence (AI) is no longer a futuristic concept -- it is here, but its development, benefits, and risks remain unevenly distributed across industries, nations, and social groups. In this talk, Jieshu presents her research on the societal dimensions of AI from two perspectives: the forces shaping AI's development (backward-looking) and its current and potential impact on society (forward-looking). She first examines disparities in AI, including women's underrepresentation in AI patents and the geographic concentration of AI innovation, highlighting inequalities in who creates AI and who benefits from it. She then explores AI's societal impact, focusing on workforce transformation and the need for GenAI literacy. She will also discuss AI patents, AI's role in climate change mitigation and adaptation, potential environmental biases in LLMs, and gender-specific patterns in AI portrayals in science fiction.

Bio: Jieshu Wang is a Postdoctoral Research Scholar at Arizona State University (ASU), focusing on the social dimensions of artificial intelligence (AI). With a background in engineering, economics, communication, and science and technology studies, she examines how AI both shapes and is shaped by broader societal forces. Her research employs interdisciplinary methods to explore the social, political, and economic factors influencing AI development, as well as its role in innovation, the economy, the future of work, climate change mitigation, and popular culture. Jieshu holds a Ph.D. in Human and Social Dimensions of Science and Technology from ASU. She is also a science book translator and has translated six books.

Location: Old Computer Science, room 1310
Virtual Talk: Metadata Matters: Robust Document Classification via Adaptation Methods for Text-driven Public Health by Xiaolei Huang

Zoom link to follow.

Abstract: Document classifiers have been widely applied in solving health-related issues, such as suicide prevention, flu vaccination surveillance and disease diagnosis. However, document metadata including time, gender, age and location has an enormous impact on robustness of 
document classifiers. Language varies across the metadata bringing both challenges and opportunities to build reliable document classifiers. For example, online written language changes over time, and males and females express opinions differently. This talk describes how to use domain adaptation to integrate temporal and user demographic factors into document classifiers. By adapting knowledge of how language varies across the metadata, models can learn generalized representations of language through the metadata-invariant embeddings. 
This approach will lead to metadata-adapted document classifiers and can also extend to personalize classification models by user embedding. 

Bio: Xiaolei Huang is a 4th-year PhD candidate in Information Science at the University of Colorado, Boulder. He is currently a visiting scholar at the Johns Hopkins University. His research interests are in Natural Language Processing, Machine Learning and Public Health. Particularly, he focuses on domain adaptation, cross-lingual transfer learning, user modeling and fairness.