Accelerating Advanced Data Visualization with RAG-Based In-Context Learning—A Novel Assistant for Scientific Workflows
Abstract
In the era of big data, the ability to quickly interpret and visualize complex datasets is paramount for advancing scientific discovery, particularly in materials science. While widely used, traditional tools like Excel and Origin often struggle to quickly and efficiently create sophisticated visualizations on-demand from new datasets. To address this limitation, we have developed a visualization assistant that leverages large language models (LLMs) and the Vega-Lite grammar to produce a diverse array of data visualizations on-demand within seconds. This assistant not only accelerates the visualization process but also enables the creation of complex and interactive visualizations that are challenging to construct with conventional tools – or by Matplotlib as frequently used in data science. Initially, we explored fine-tuning LLMs to specialize them for our visualization tasks. However, this approach proved to be difficult and ineffective due to several drawbacks: high computational costs, lengthy training times, required skill levels, and the extreme overhead in adapting to new visualization types over time. In our talk, we will present how we overcame these challenges by employing Retrieval-Augmented Generation (RAG)-based in-context learning. We will delve into dataset creation, the architecture and workflow of our visualization assistant, and its current capabilities—including creating various chart types, incorporating aggregations, and adding interactive elements. Thereby, all visualizations can be crafted from simple natural language queries, and since the actual data is never sent directly to the LLMs, confidentiality is ensured. Furthermore, we will present recent advancements in transitioning to agentic workflows. This methodology streamlines the visualization process and addresses data security concerns, making it highly suitable for sensitive research environments. Additionally, we believe that our approach democratizes access to advanced on-demand visualizations and serves as a template for developing RAG-based in-context learning systems for applications in material science, aiming to inspire interdisciplinary collaboration and drive innovation in AI-catalyzed scientific workflows.