Introducing ado: The accelerated discovery orchestrator
Over the past decade, IBM Research’s partnership with the UK’s Science and Technology Facilities Council (STFC) Hartree Centre, and their joint program the Hartree National Centre for Digital Innovation (HNCDI), has been grounded in a shared mission: enabling industry to harness the power of scientific computing.
At the heart of this is experimentation, and across our joint projects, we’ve run thousands of experiments across a wide range of domains — from formulation chemistry to climate science. These experiments span everything from measuring molecular properties using physics-based simulations and training and evaluating AI models, to benchmarking computational code performance and tuning workflows for finding solutions more quickly. This and other such collaborations around the globe have inspired an approach to solving research challenges in a unified way.
The challenge of computational experimentation
We've seen some consistent challenges around orchestrating computational experiments when working to solve complex research goals. Every experiment — regardless of domain — depends on a common set of foundational components:
- Configuration: A way to describe and set up the experiment
- Deployment: A method to run it in a reproducible environment
- Execution: A mechanism to drive it (such as optimization or sampling)
- Persistence: A system to store and retrieve results
Yet despite these shared needs, fragmentation remains a persistent issue. Each domain and experiment type tends to bring its own tools, formats, and constraints, creating friction that slows progress and complicates collaboration. In mature ecosystems, existing solutions may address some of these needs, but outside those boundaries, researchers often find themselves reinventing the wheel. Adopting techniques across domains can require reimplementation from scratch, while switching experiment types often demands learning entirely new frameworks. In domains without mature tooling, there may be no shared foundation at all.
This fragmentation affects not just researchers, but also the engineers and administrators responsible for scaling and integrating these tools. They’re left dealing with a litany of challenges: incompatible databases, conflicting data formats, diverse runtimes, disparate APIs, and inconsistent error handling. The result is a fractured landscape that diverts valuable resources, slows innovation, increases operational overhead, and hampers effective collaboration.
Enter ado: A unified framework for scientific experimentation
This fragmentation led us to ask a simple but powerful question: Could we create a unified way to handle the common tasks of computational experimentation — regardless of domain? Our answer is ado, the Accelerated Discovery Orchestrator.
Ado is a unified platform for executing computational experiments at scale and analyzing the results of those experiments. It allows distributed teams of researchers and engineers to collaborate on projects, execute experiments, and share data. Analysis techniques are automatically usable across domains, and a common usage pattern makes moving between experiments or domains straightforward.
The platform is written in Python and ado users can extend it with their own experiment, analysis tool, or optimization algorithm and automatically get features addressing all the foundational components mentioned above, for example:
Inspired by Kubernetes, built on Discovery Spaces
In designing ado, we drew inspiration from the Kubernetes container platform and its foundational concept of Pods. A Pod encapsulates the execution logic for a workload — such as the container image, environment variables, command-line arguments, health checks, and lifecycle hooks. Crucially, Kubernetes builds common platform features around this abstraction: managing pod lifecycles, injecting secrets, attaching storage, scheduling, and more. By standardizing how workloads are described and controlled, the Pod abstraction enables Kubernetes to apply consistent infrastructure and operational capabilities across diverse domains.
From this inspiration, we created our key abstraction, our equivalent of a pod: the Discovery Space.
What Is a Discovery Space?
A Discovery Space consists of a generic way to describe what to measure and how to measure it, as well as a universal schema for storing measurements (what was measured and the results). If you’re familiar with CSV files, think of a Discovery Space as the hidden context behind a CSV file of experimental results:
- What are the column headers?
- Which columns are measurement inputs and which the outputs?
- How do you add new rows?
- What order were the rows added?
- How existing rows were added?
- What rows are missing?
Normally, this metadata is scattered or implicit. A Discovery Space makes it explicit and structured.
Why Discovery Spaces matter
By encapsulating what to measure and how, along with a universal schema for the results, Discovery Spaces allow the rest of ado to remain domain-agnostic. This unlocks powerful capabilities:
- Analysis tools can operate on Discovery Spaces without knowing domain-specific details. ado can integrate analysis tools without knowing exactly what they do.
- Optimization algorithms can select points and optimize against the outputs of experiment, again without needing domain knowledge.
- Measurement tools can be easily added – they only need to accept inputs and return results in an ado-compatible format.
This separation of concerns means that tools can be reused across domains, and new experiments can be automated without starting from scratch.
Unpacking the first release of ado
The first release of ado is designed to be accessible and powerful. It runs locally — you can try it out on your laptop with included examples. It supports distributed teams, allowing you to share experiments and results across groups. You can also scale your experiments but running them on Ray clusters.
It is also battle-hardened, having been put through its paces running tens of thousands of fine-tuning benchmark experiments. In terms of experiments and exploration, our focus has been on LLM workload performance. In this first release, we provide:
We plan to add more experiments and analysis methods over the coming months. We also provide example templates so users can start adding their own experiment and analysis methods.
A vision for converged discovery
A key focus of our ongoing collaboration with STFC Hartree Centre is enabling convergence across the diverse computing and experimental environments that modern scientific discovery demands. STFC itself has infrastructure that spans everything from traditional high-performance computing, to quantum platforms, Kubernetes-based AI inference environments, and commercial cloud. Each of these platforms offers unique capabilities — but used in isolation, they introduce complexity, friction, and sustainability challenges.
Our joint projects have explored convergence at multiple levels. We have created a unified job submission system so researchers can easily and securely run their jobs across heterogeneous systems. And we’ve built shared control planes that allow administrators to dynamically reallocate resources between different system types to meet demand. We see ado as sitting at the top of this stack, and can serve as a single entry point for running discovery campaigns, abstracting the underlying complexity and allows researchers to focus on exploration — not the infrastructure.
We’re still in the early stages with ado. We don’t yet know whether it’s possible to unify all computational exploration under one framework. But this is our attempt to bring order to the chaos of modern experimentation. By abstracting away domain-specific details and offering a consistent interface for experimentation, analysis, and optimization, we believe ado can accelerate discovery across disciplines.
We’re open-sourcing ado because we want to hear from you — researchers, engineers, data scientists, and tool builders. Try it out, break it, extend it, and tell us what works and what doesn’t. Together, we can build the foundation for a more integrated, flexible, and scalable future of computational discovery.
Related posts
- ResearchKim Martineau
How to make AI models more accurate: Embrace failure
ResearchPeter HessThe 2024 IBM Research annual letter
Deep DiveSriram Raghavan, Mukesh Khare, and Jay GambettaMeet IBM’s new family of AI models for materials discovery
NewsKim Martineau