Teaching and Evaluating LLMs for Polymer Design Related Tasks

Event Description

Abstract: Much like other AI for Science domains, polymer design poses significant challenges. It requires grounding in empirical data and physical laws, precise handling of domain-specific structured representations, and compositional reasoning over multiple interacting constraints--all while working with limited data.

To address these limitations, we introduce PolyBench, a large-scale benchmark comprising over 125K polymer design and analysis tasks grounded in verified experimental and synthetic data. PolyBench includes tasks created from a wide range of data sources and presents diverse structural, property-driven, and synthesis-oriented reasoning problems. Tasks in PolyBench are organized from simple to complex analytical reasoning problems, enabling generalization tests and includes diagnostic probes to evaluate model capabilities. Additionally, to support effective domain alignment, we propose a knowledge-augmented reasoning distillation framework that enriches the dataset with structured chain-of-thought supervision derived from expert-informed reasoning strategies.

Small language models (7B-14B parameters) trained on PolyBench substantially outperform comparably sized baselines and, in many cases, exceed the performance of larger closed-source frontier models on polymer reasoning tasks, while also demonstrating improved transfer to external polymer benchmarks. Last, we conduct a diagnostic study that reveals a compositionality gap: despite strong performance on decomposed sub-questions, models struggle to integrate multiple interacting constraints and intermediate reasoning steps, highlighting fundamental limitations in current scientific language models.

Speaker: Dikshya Mohanty

Location: NCS 115/Online

Zoom: https://stonybrook.zoom.us/j/94746001760?pwd=BCAd8gu7cXLn3PXM6kkbh11V6r0Mr7.1
Meeting ID: 947 4600 1760 Passcode: 987917

Date Start

Date End