ChemChat—Recent Advances in Democratizing and Facilitating Access to Domain-Specific AI/ML Through LLM-Powered Conversational Assistants
Abstract
In recent years, computational chemistry and machine learning have undergone transformative advancements, yielding powerful tools and AI models. Despite this progress, these resources remain underutilized due to high technical barriers and their tendency to operate in silos. The necessity for programming and ML expertise further restricts access for many domain experts, particularly experimentalists. Meanwhile, large language models (LLMs) from companies like OpenAI (GPT), Google (Gemini), Meta (Llama), xAI (Grok), and Anthropic (Claude) have revolutionized various sectors over the past 24 months. However, their application in chemistry—even with the recent GPT-o1—remains limited due to deficiencies in understanding scientific workflows, domain-specific tasks (e.g., drug discovery), access to current data sources, skill-based reasoning, and accurate referencing, often leading to incorrect and hallucinated responses that undermine trust and reliability. This critical gap between AI and scientific disciplines can be bridged by equipping LLM-powered conversational assistants with specialized cheminformatics tools and AI models. By providing tailored instructions on their capabilities and usage, such an assistant can intelligently plan and execute workflows to fulfill user requests. This approach promises to (I) increase the adoption of cheminformatics tools and AI models, (II) democratize AI/ML accessibility within the field, and (III) ultimately enhance scientific discovery and education. In this talk, we introduce ChemChat, a proof-of-concept fully functional and cloud-deployed conversational assistant for material science and data visualization, and our advancements towards agentic systems. It features a chatbot-driven web application interface and is powered by non-OpenAI LLMs. By integrating existing cheminformatics tools and advanced AI models—including PubChem, CIRCA, RDKit, GT4SD, RXN, MolFormer, DeepSearch, and other knowledge sources—ChemChat aids chemists with tasks such as property calculations, molecule design, retrosynthesis, data visualization, and literature research. Our presentation will detail ChemChat’s workflow architecture, its use of retrieval-augmented generation (RAG)-based in-context learning, and its specific use cases. A comparison with popular applications and recent developments like ChatGPT, ChemCrow, and SynAsk will also be provided. We hope that our work can serve as a blueprint for accelerating the development of similar systems within the scientific community, particularly in material science, to further enhance collaboration, discovery, and innovation.