Recently, large-scale language data combined with modern machine learning techniques have shown strong value as means for studying human psychology and behavior. For example, language alone has been shown predictive in mental health, personality, and health behaviors. However, many applications for such language-based assessments have readily available and important data beyond language (i.e. extra-linguistics), such as predicting the subjective well-being of a community using tweets, where one can take into account their age, education, and demographic attributes. Language may capture some characteristics while extra-linguistic variables captures others. We believe that effectively integrating linguistic and extra-linguistic data can yield benefits beyond either independently. In this thesis, we develop methods which effectively integrate extra-linguistic data with language data focused primarily on social scientific applications. The central challenge is dealing with the size and heterogeneity of, often sparse and noisy, language data versus the, often low-dimensional and non-sparse, extra-linguistic variables. First, we consider structured extra-linguistics, like socioeconomic (income and education rates) and demographics (age, gender, etc.), and propose two integration methods, named residualized controls (RC) and residualized factor adaptation (RFA), to be used in county-wise prediction tasks. Demonstrating techniques that integrate information at both the model-level and data-level, we found consistently strong improvement over naively combining features, for example, increasing county level well-being predictions by over 12%. Next, we consider unstructured extra-linguistic data. In the first part, we incorporate social network connections and language over time to propose a novel metric for quantifying the stickiness of words - their ability to spread across friendship connections in a social network over time (or in other words, stick in ones vocabulary after seeing friends use it). We obtain which language features are more probable to disseminate through friendship and show such a metric is useful for predicting who will be friends and what content will spread. In addition, we analyze language content over time by proposing a novel dynamic content-specific topic modeling technique that can help to identify different sub-domains of a thematic scope and can be used to track societal shifts in concerns or views over time.
Join CELT on Tuesday, March 31 for a focused, one-hour overview on how to redesign and future-proof assessments in the age of AI! This session will cover three key areas: leveraging AI as a co-pilot for developing effective exam questions, designing authentic assessments, and exploring how AI can strategically support active learning structures like Team-Based Learning (TBL), Project-Based Learning (PBL), and Scenario-Based Learning (SBL).

Register here.
CSE 600 Talk: Squeezing Software Performance via Eliminating Wasteful Operations presented by Xu Liu

ABSTRACT: Inefficiencies abound in complex, layered software. A variety of inefficiencies show up as wasteful memory operations, such as redundant or useless memory loads and stores. Aliasing, limited optimization scopes, and insensitivity to input and execution contexts act as severe deterrents to static program analysis. Microscopic observation of whole executions at instruction- and operand-level granularity breaks down abstractions and helps recognize redundancies that masquerade in complex programs. In this talk, I will describe various wasteful memory operations, which pervasively exist in modern
software packages and expose great potential for optimization. I will discuss the design of a fine-grained instrumentation-based profiling framework that identifies wasteful operations in their contexts, which guides nontrivial performance improvement. Furthermore, I will show our recent improvement to the profiling framework by abandoning
instrumentation, which reduces the runtime overhead from 10x to 3% on average. I will show how our approach works for native binaries and various managed languages such as Java, yielding new performance insights for optimization.

BIO: Xu Liu is an assistant professor in the Department of Computer Science at College of William & Mary. He obtained his PhD from Rice University in 2014 and joined the College of William & Mary in the same year. Prof. Liu works on building performance tools to pinpoint and optimize inefficiencies in HPC code bases. He has developed several open-source profiling tools, which are used worldwide at universities, DOE national laboratories and industrial companies. Prof. Liu has published a number of papers in high-quality venues. His papers received Best Paper Award at SC'15, PPoPP'18, PPoPP'19 and ASPLOS'17 Highlights, as well as Distinguished Paper Award at ICSE'19. His recent ASPLOS'18 paper has been selected as ACM SIGPLAN Research Highlights in 2019 and nominated for CACM Research Highlights. Prof. Liu is the receipt of 2019 IEEE TCHPC Early Career Researchers Award for Excellence in High Performance Computing. Prof. Liu served on the program committee of conferences such as SC, PPoPP, IPDPS, CGO, HPCA and ASPLOS.
The Empirical Methods in Natural Language Processing (EMNLP) conference is a premier international academic conference in the field of artificial intelligence and natural language processing (NLP). Organized annually by the Association for Computational Linguistics (ACL) special interest group on linguistic data (SIGDAT), it focuses on research that uses empirical methods to solve language processing problems.

For more information, and registration, visit the official website.


Date of Event

Joel H. Saltz, MD, PhD
SUNY Distinguished Professor Cherith Professor and Founding Chair
Department of Biomedical Informatics
Stony Brook University

Apostolos K. Tassiopoulos, MD, FACS
Professor of surgery and vice chair for quality and outcomes Chief of the Division of Vascular and Endovascular Surgery
Director of the Stony Brook Vascular Center Stony Brook Medicine

Title: Clinical applications of artificial intelligence to improve diagnosis and risk stratification for patients with aortic aneurysms

Time: Wednesday, Feb 17, 2021 3 pm - 4 pm

Join Zoom Meeting
https://stonybrook.zoom.us/j/95617197636?pwd=KytzZ2pVRG9SZGpKZUtpNXJISj...
Meeting ID: 956 1719 7636 Passcode: 924293

The Antonija Prelec Memorial Committee in collaboration with Stony Brook University Libraries are very excited to bring you the 2019 Prelec Memorial Lecture! This year, we are pleased to announce our speaker is Patricia Flatley Brennan, RN, PhD, Director of the National Library of Medicine.

No registration required. Find more information here.

The next AI Institute seminar speaker will be Chao Chen of Biomedical Informatics, on Monday November 29 at noon via zoom:

https://stonybrook.zoom.us/j/96233844681?pwd=aVVsUnIzMWJDMHRqVXcrQU5HMjFVQT09

He will be talking on the Detection of Trojan Attacks to Deep Neural Networks - A Topological Perspective, with his abstract and bio below.


Abstract: Deep neural networks are known to have security issues. One particular threat is the Trojan attack. It occurs when the attackers stealthily manipulate the model's behavior through Trojaned training samples, i.e., samples with special trigger injected and labels altered. To identify a Trojaned model at deployment is challenging, due to limited access to the training data. We propose to identify Trojaned neural networks using methods from topological data analysis. In particular, we propose to (1) inspect high-order topological features of the neuron interactions and (2) reverse engineer the injected triggers using a topological loss. These approaches take different angles and reveal insights into the behavior of neural networks when their strong memorialization power is exploited maliciously. The work has been accepted to NeurIPS'21. I will also briefly mention other research directions from my group, including incorporating topological information into deep image analysis, topology-inspired graph neural networks, and robust training of neural networks with label noise. These works have been published in ICLR, ICML, NeurIPS, ECCV, ICCV and AAAI in recent years.
Bio: Dr. Chao Chen is an assistant professor of Biomedical Informatics at Stony Brook University. His research interests span topological data analysis (TDA), machine learning and biomedical image analysis. He develops principled learning methods inspired by the theory from TDA, such as persistent homology and discrete Morse theory. These methods address problems in biomedical image analysis, robust machine learning, and graph neural networks from a unique topological view. His research results have been published in major machine learning, computer vision, and medical image analysis conferences. He is serving as an area chair for MICCAI, AAAI, CVPR and NeurIPS.
University Libraries Present: Analyzing quantitative data can feel overwhelming without the right tools. In this workshop, SBU Libraries' Data Literacies Lead, Ahmad Pratama will show you how to master the basics of exploratory data analysis for quantitative data using Python. This workshop covers several techniques to help you uncover patterns and insights in your datasets.

Online RSVP via link: https://stonybrook.zoom.us/meeting/register/vEPycmDrQoGjFqkmsYHgxw
Kate Armstrong, a Vancouver-based artist, writer, and independent curator, will explore the role of AI in art and creativity through three AI-driven projects: KEKE Terminal, Botto, and Sasha Stiles' AI collaborator Technelegy. She will compare these projects to historical artistic movements and investigate AI's role as an autonomous creative agent, the function of community participation, and the shifting dynamics of authorship.

Location: Humanities Institute Room 1008