Stony Brook Presents Advancements in Computer Vision at ICCV 2023

This year’s International Conference on Computer Vision (ICCV) showcased several SBU’s AI researchers, sharing their notable milestones in the field of Computer Vision. Their contributions to a variety of fields—including robotics, machine learning, and augmented reality, among others—were presented at the premier international computer vision conference, which is regarded as one of the top conferences in computer vision, alongside CVPR and ECCV.

The annual conference, which was held in Paris between October 2nd and 6th, comprises the main conference along with several co-located workshops and tutorials. Researchers from around the world are invited to present their findings.

Some of SBU’s notable contributions include:

1. GAIT: Generating Aesthetic Indoor Tours with Deep Reinforcement Learning
Desai Xie, Ping Hu, Xin Sun, Soren Pirk, Jianming Zhang, Radomir Mech, Arie E. Kaufman
Composing a shot by framing a scene with a camera plays an integral part in photography and cinematography. A carefully composed frame doesn’t just provide information about a scene but also instills a desired emotion in the viewer, carrying forward the story the artist wants to convey. Photographers and movie directors spend a significant amount of time perfecting camera framing, and while this provides a high degree of control, it would be desirable to frame a scene automatically by computing camera poses. This paper introduces GAIT, a framework for training a Deep Reinforcement Learning (DRL) network that learns to move the camera so as to generate camera trajectories that show the most aesthetic views while also ensuring smooth movement. The novel method is validated by comparing it to baseline algorithms, using a perceptual user study, and through ablation studies.

2. CheckerPose: Progressive Dense Keypoint Localization for Object Pose Estimation with Graph Neural Network
Ruyi Lian, Haibin Ling
Estimating an object’s position from an image happens by measuring the rotation and translation of a given rigid object relative to the camera and is crucial in various applications, including robot grasping and manipulation, autonomous driving, and augmented reality. Most existing methods work well theoretically, but in practice, they’re prone to errors owing to factors such as occlusion, background clutter, and lighting variation. In this paper, the team proposes a novel pose estimation algorithm named CheckerPose, which improves on multiple aspects, each of which makes it increasingly reliable. The algorithm visibly boosts the accuracy of existing methods, achieving state-of-the-art performances when evaluated on popular object pose estimation benchmarks.

3. Local Context-Aware Active Domain Adaptation
Tao Sun, Cheng Lu, Haibin Ling
Unsupervised Domain Adaptation, or UDA, is a framework that can adapt an existing model from a related domain to be implemented in a new, unlabeled domain. While this framework has been successful in many applications, it is a challenging task, especially when the domain gap is large. However, it’s often allowable to annotate a small amount of unlabeled data. This new paradigm of Active Domain Adaptation, ADA) has been drawing increasing attention due to its promising performance with minimal labeling cost. This paper proposes exploring the local context of queried data in order to optimize ADA. The team’s novel framework, called Local context-aware Active Domain Adaptation (LADA), is shown to outperform existing state-of-the-art ADA solutions.

4. S-VolSDF: Sparse Multi-View Stereo Regularization of Neural Implicit Surfaces
Haoyu Wu, Alexandros Graikos, Dimitris Samaras
Neural surface reconstruction techniques perform really well when rendering images, and have become increasingly popular in the field of 3D vision. However, these techniques require dense input views as supervision, which is limiting for many real-world applications such as robotics, autonomous driving, and augmented reality, where sparse input images are the only source of information. Noting that this challenge can be overcome when a 3D point is visible in multiple views—as is the case in multi-view stereo (MVS)—this paper proposes to regularize neural rendering optimization with an MVS solution, showing that this novel method not only outperforms generic neural rendering models by a large margin but also significantly increases the reconstruction quality of MVS models.

“Stony Brook’s AI researchers are driving progress across a number of fields,” says SBU’s AI Institute Director Steven Skiena. “Their contributions, often novel and insightful, are valuable and continue to add to existing literature as well as advance progress, and we look forward to seeing their work evolve and impact the industry.”

SUNY Empire Innovation Prof. Haibin Ling adds, “This would not have been possible without help from the US National Science Foundation, Defense Advanced Research Projects Agency, and the SBU/BNL Seed Grant Award. We’re grateful for their ongoing support, and also for our reviewers’ valuable comments and suggestions. The recognition of our work not only acknowledges our contributions but also makes us want to continue working with the AI community.”

Ankita Nagpal
Communications Assistant