Scientific Knowledge Discovery with Large Language Models
Event Description
Abstract: Large Language Models (LLMs) have revolutionized how people interact with knowledge, offering unprecedented opportunities to accelerate the pace of scientific discovery. In this talk, I will discuss my research on the synergy between LLMs and scientific knowledge--specifically how these models extract, induce, and verify knowledge to automate the research lifecycle. First, I will cover our work on improving knowledge extraction from vast scientific literature, focusing on enabling models to comprehend long documents in a cost-efficient and comprehensive manner. I will describe a novel paradigm for representing document-level structured information as question-answer pairs and how we address the challenges of long-context understanding by leveraging global context through retrieval-augmented modeling. Next, I present our pioneering work on using LLMs for new scientific hypothesis generation. We introduce a framework employing reinforcement learning with fine-grained reward modeling and adaptive controllers.
This approach balances novelty, feasibility, and effectiveness to generate inspiring and actionable research hypotheses. Finally, I will discuss work on the first LLM Scientist for machine learning research. I will demonstrate how LLMs can move beyond hypothesis generation to participate in the execution and validation of scientific hypotheses, ensuring that the discovered knowledge is not only innovative but also grounded and verified.
Bio: Xinya Du is a tenure-track assistant professor at UT Dallas Computer Science Department. He earned a Ph.D. degree from Cornell University and was a Postdoctoral Research Associate at the University of Illinois (UIUC). He has also worked at Microsoft Research, Google Research, and Allen Institute AI. His research is on large language models, deep learning, and their applications in science.His work has been published in leading NLP and ML conferences (ACL, ICLR, NeurIPS). His research has received multiple recognitions, including a Best Paper Award at AAAI AI for Research and a Best Poster Award at ICML AI for Science workshop. His work was included in the list of Most Influential ACL Papers and has been covered by major media like New Scientist. He was named a Spotlight Rising Star in Data Science by the University of Chicago and is the recipient of several prestigious awards, including the Amazon Research Award, Cisco Research Award, Open Philanthropy Award, and the NSF CAREER Award.
Location: NCS 120