Feng Zhu's Team & AI for Science Team Publishes in Nature Machine Intelligence: Large Language Models Empower Acceleration of Organic Chemical Synthesis
发布时间:2025年07月23日

Supported by the Shanghai AI Major Project and leveraging the Shanghai Jiao Tong University AI for Science Open Scientific Data Platform, a research team led by Associate Professor Zhu Feng from the Frontiers Science Center for Transformative Molecules, in collaboration with Associate Professor Xu Yanyan, Professor Jin Yaohui, Professor Yang Xiaokang, and others from the SJTU Institute of AI for Science, has achieved a substantial breakthrough in artificial intelligence for organic chemistry synthesis (AI for Chemistry).

The related research, entitled "Large language models to accelerate organic chemistry synthesis", was published online in Nature Machine Intelligence on July 1, 2025. This work demonstrates the significant potential of large language models (LLMs) in empowering and accelerating organic synthesis.

Aticle abstract:

Chemical synthesis, as a foundational methodology in the creation of transformative molecules, exerts substantial influence across diverse sectors from life sciences to materials and energy. Current chemical synthesis practices emphasize laborious and costly trial-and-error workflows, underscoring the urgent needs for advanced AI assistants. Recently, large language models, typified by GPT-4, have been introduced as an efficient tool to facilitate scientific research. Here we present Chemma, a fully fine-tuned large language model with 1.28 million pairs of questions and answers about reactions, as an assistant to accelerate organic chemistry synthesis. Chemma surpasses the best-known results in multiple chemical tasks, for example, single-step retrosynthesis and yield prediction, which highlights the potential of general artificial intelligence for organic chemistry. By predicting yields across the experimental reaction space, Chemma significantly improves the reaction exploration capability of Bayesian optimization. More importantly, integrated in an active learning framework, Chemma exhibits advanced potentials of autonomously experimental exploration and optimization in open reaction spaces. For an unreported Suzuki–Miyaura cross-coupling reaction of cyclic aminoboronates and aryl halides for the synthesis of α-aryl N-heterocycles, the human–artificial intelligence collaboration successfully explored a suitable ligand (tri(1-adamantyl)phosphine) and solvent (1,4-dioxane) within only 15 runs, achieving an isolated yield of 67%. These results reveal that, without quantum-chemical calculations, Chemma can comprehend and extract chemical insights from reaction data, in a manner akin to human experts. This work opens avenues for accelerating organic chemistry synthesis with adapted large language models.