Stony Brook Presents AI Research at NeurIPS 2023

Stony Brook researchers collaborated with academic centers and AI labs from around the world to advance machine learning, robotics, and computer vision. Their upcoming research is being presented at the 37th edition of the Conference on Neural Information Processing Systems, or NeurIPS—the most cited AI conference in the world.

The conference, which is being held in New Orleans from Sunday, Dec 10 through Saturday, Dec 16, is a multi-track interdisciplinary annual meeting including invited talks, demonstrations, symposia, and oral and poster presentations of refereed papers, creating space for a less formal setting for the exchange of ideas.

This year’s conference has invited researchers from Stony Brook to share their latest advances, some of which have been highlighted below:

1.
Yuting Hu, Jiajie Li, Florian Klemme, Gi-Joon Nam, , Hussam Amrouch, Jinjun Xiong

Today, integrated circuits (ICs) underpin the latest technological advancements. However, designing modern ICs is a complex process that involves measuring phenomena (such as timing, noise, power, etc.) for tens of billions of electronic components. These analyses utilize a combination of industry de facto tools and mathematical approximation techniques in order to strike a balance between the accuracy and speed of predictions. This paper focuses on the timing analysis of interconnects (a.k.a. wires), presenting the first-ever closed-form solution leveraging GNNs to perform circuit timing analysis—SyncTREE. Experiments show that, compared to conventional GNN models, SyncTREE achieves the best timing prediction in reference to the industry golden numerical analysis results on real IC design data.

2.
Kanchana Ranasinghe, Michael Ryoo

Manual human annotation for training AI to define actions in videos can be both noisy and expensive, which is why self-supervised learning approaches focusing on objects, their relationships, and their interactions are invaluable. Researchers have attempted to draw solutions from a recent variant of self-supervision that explores learning with loosely paired image-caption pairs. However, their counterparts in the video domain do not exhibit the same results. This paper explores self-supervised learning techniques that can adapt these image models to the video domain under entirely self-supervised settings, proposing a novel language-based self-supervised learning objective and presenting a framework termed Language-based Self-Supervision, or LSS, which retains and improves the transferability of image CLIP representations much more effectively.

3.
Jinghuan Shang,

Although Reinforcement Learning (RL) has demonstrated success across challenging tasks and games in both simulated and real environments, the observation spaces for visual RL tasks are typically predefined and cannot be adjusted by the object itself. For example, a robot with a fixed overhead camera view. This lack of active visual perception poses challenges to learning in highly dynamic environments, open-world tasks, and partially observable environments. This paper proposes SUGARL—a framework that models motor and sensory policies separately, but jointly learns them using a reward that incentivizes the sensory policy to select observations that are optimal to infer its own motor action. The sensory policies learned through this method are shown to exhibit effective active vision strategies.

4.
Muchao Ye, Ziyi Yin, Tianrong Zhang, Tianyu Du, Jinghui Chen, Ting Wang, Fenglong Ma

Even though deep neural networks (DNNs) perform natural language processing (NLP) tasks tremendously, their robustness has been doubted due to their vulnerability against adversarial attacks, especially synonym substitution. Recent years have seen an urge for NLP models that can provide certified robust predictions for this kind of attack. The models produce a single output along with the probability of the prediction being certified, and can be categorized into two types, both of which unfortunately ignore the unification of training frameworks and robustness of the base model used to produce the output. This paper proposes a novel framework UniT to solve this problem and introduces a loss for certified robust training, using experimental results to show that the design of the UniT with DR loss is effective in improving the certified robust accuracy of both types of certification scenarios.

5.
Bochuan Cao, Changjiang Li, Ting Wang, Jinyuan Jia, Bo Li, Jinghui Chen

Diffusion-based image generation models such as DALL-E 2 can learn from given images and generate high-quality samples following the guidance from prompts. For instance, they can be used to create artistic images that mimic the style of an artist based on her original artworks. However, such ability also brings serious ethical issues when there’s no proper authorization from the owner of the original images. In response, several attempts have been made to protect the original images from such unauthorized usage. This paper introduces a perturbation purification platform named IMPRESS to evaluate the effectiveness of imperceptible perturbations as a protective measure. The proposed platform offers a comprehensive evaluation of several contemporary protection methods and can be used as an evaluation platform for improving protection.

6.
Ziyi Yin, Muchao Ye, Tianrong Zhang, Tianyu Du, Jinguo Zhu, Han Liu, Jinghui Chen, Ting Wang, Fenglong Ma

Vision-Language (VL) pre-trained models, which first learn human interactions by pre-training on the large-scale unlabeled image-text datasets and are later fine-tuned with labeled pairs on different VL tasks, have revealed more powerful cross-task learning capabilities compared to training from scratch. Despite their remarkable performance, the adversarial robustness of these VL models is still relatively unexplored. This paper explores a new yet practical attack paradigm — generating adversarial perturbations on a pre-trained VL model to attack various black-box tasks fine-tuned on the pre-trained one, proposing VLATTACK, a model designed to search adversarial samples from both single and multimodal levels. Experiments to attack five widely-used VL pre-trained models for six tasks show that VLATTACK achieves the highest attack success rates on all tasks compared with state-of-the-art baselines, hence revealing a blind spot in the deployment of pre-trained VL models.

7.
Zhaohan Xi, Tianyu Du, Changjiang Li, Ren Pang, Shouling Ji, Jinghui Chen, Fenglong Ma, Ting Wang

Pre-trained language models (PLMs) have demonstrated remarkable performance as few-shot learners (few-shot learning is a machine learning paradigm where a model is trained to make accurate predictions with only a small number of examples per class during initial training). However, their security risks under such settings are largely unexplored. This paper presents the results of a pilot study to show that PLMs as few-shot learners are highly vulnerable to backdoor attacks, and existing defenses are inadequate in resolving the challenge. To address this issue, the paper advocates MDP — a novel, lightweight, pluggable, and effective defense for PLMs as few-shot learners, noting that empirical evaluation using benchmark datasets and representative attacks validates the efficacy of MDP.

8.
Sergey Shuvaev, Evgeny Amelchenko, Dmitry Smagin, Natalia Kudryavtseva, Grigori Enikolopov, Alexei Koulakov

Social conflict has been studied in humans, non-human primates, and mice, investigating how social conflict leads to the formation, maintenance, and plasticity of aggressive and subordinate behavioral states in animals. These studies, however, rarely focused on the dynamic nature of aggressiveness in individuals, constraining our ability to understand and predict harmful behaviors. This study uses behavioral data modeling to explore the strategies of aggressive behavior in individual mice over prolonged periods of time, provising insights not only into the development of pathological aggression and defeat but it also offers a path toward their mitigation.

9.
Saumya Gupta, Yikai Zhang, Xiaoling Hu, Prateek Prasanna, Chao Chen

The segmentation and labeling of curvilinear structures such as road networks is challenging due to relatively weak signals and complex geometry/topology. To facilitate and accelerate large scale annotation, one has to adopt semi-automatic approaches such as proofreading by experts. This paper focuses on uncertainty estimation for such tasks, so that highly uncertain, and thus error-prone structures can be identified for human annotators to verify, proposing (1) a joint prediction model that estimates the uncertainty of a structure while taking the neighboring structures into consideration (inter-structural uncertainty); (2) a novel Probabilistic DMT to model the inherent uncertainty within each structure (intra-structural uncertainty). The method produces better structure-wise uncertainty maps compared to existing works on various 2D and 3D datasets.

“Stony Brook’s AI researchers are driving progress across a number of fields,” says SUNY Empire Innovation Associate Professor Michael S. Ryoo. “Their contributions, often novel and insightful, are valuable and continue to add to existing literature as well as advance progress.”

More details about these projects can be found on the official website.

 

Ankita Nagpal
Communications Assistant