Bringing cloud-native AI supercomputing to a data center near you
Last year, we shared our experience in designing and building Vela, IBM’s first cloud-native AI supercomputer, natively integrated into the fabric of IBM Cloud.
This system was designed to marry together agile, cloud-native development with generative AI, from optimized infrastructure up through a capable platform, and software stack. IBM has used this capability for a wide variety of generative AI endeavors, from data preparation and model training or fine-tuning, to incubating and ultimately running production services like watsonx.ai. The ability to dynamically shift how we allocate resources on Vela has been critical in allowing IBM to be agile in this rapidly evolving industry.
Businesses all over the world see the transformative potential of generative AI. For many of IBM’s customers, that potential will be unlocked on-premises or in local IT environments. IDC estimates that roughly 65-70% of enterprise use of generative AI (primarily inferencing and fine-tuning) will be on-premises or in a hybrid environment.1 In a subsequent study, IDC also showed that having integration with hybrid cloud was the most frequently cited characteristic by enterprise users in selecting an AI infrastructure provider.2 This drive towards on-premise and hybrid solutions is frequently due to the regulatory landscape, where compliance with sovereignty requirements prevents data and operational control from leaving a particular geography. After we published our story about Vela, IBM partners and customers across the world started inquiring whether we could deliver something like Vela outside of the IBM Cloud in their on-premises environments.
Over the last year, IBM Research has been developing a solution for delivering an end-to-end, on-premises cloud-native AI supercomputer, from the system up through the AI platform and software stack to customers. The architecture can scale from dozens to hundreds or thousands of NVIDIA H100 GPUs. It contains a flexible, scalable, and affordable RDMA-enabled Ethernet-based network capable of supporting high-performance AI training and a high-performance storage system based on IBM Storage Scale, one of the industry’s fastest and most flexible file systems.
This solution contains a cloud-native AI platform, based on OpenShift Container Platform and OpenShift AI, with access to watsonx.ai as needed. The AI infrastructure can be provisioned and elastically accessed through APIs, just like in a public cloud. Our software stack provides pre-built containers, AI models, and the tools needed to productively innovate with generative AI. We provide all the software and automation needed to bring up and operate this system end-to-end in a way that is secure, resilient, and operationally efficient, like a cloud. And we provide various layers of support to enable secure isolation between multiple tenants on the system.
The very first one of these systems has now been delivered to our partner, Phoenix Technologies, in Switzerland. The first phase came online at Phoenix in mid-August of 2024, thanks to a collaboration across teams at IBM, Red Hat, Phoenix, and our partner Dell.
The delivery of this system was part of a broader AI Innovation Center collaboration between IBM Research and Phoenix, where we are partnering closely to create a local AI hub for a broader ecosystem of Phoenix’s customers. Their sovereign AI cloud solution, kvant AI, aims to provide sovereign AI applications for every industry. We are also establishing a joint research and development agenda, complete with training, education, and skills development.
It takes deep expertise and know-how to design and operate AI-optimized systems and software stacks. This project has been about packaging the expertise that we’ve developed over the last few years so that our partners can become productive dramatically faster. Through this project, our partners gain a cost-performance optimized AI technology stack with a multi-tenant operational model, allowing flexibility in how resources are efficiently and securely allocated between users and workloads.
We are excited about what the end-to-end, on-premises cloud-native AI supercomputer project will enable. Hyperscale clouds are not everywhere, and no single AI model will solve every problem our customers can imagine. Just as IBM is using this infrastructure to conduct some of our most advanced AI research, our partners and clients all over the world are looking for scalable, end-to-end integrated, and future-proofed solutions to enable their own innovation agendas. We can unlock this by bringing access to the appropriate infrastructure and tools to where customers can productively use them. With the on-premises AI architecture, we can also provide the engine which ecosystems can use to share their innovative spirit, data, investments, and drive to become true value creators.
IBM is deeply invested in this idea, and this initiative is only just beginning. Later this year we’ll be expanding with Phoenix Technologies into multiple footprints and in 2025 we’ll expand further with a scale-out deployment of AIU-Spyre, the IBM Research designed AI accelerator. We also intend to establish more AI Innovation Centers with a collection of global partners. Over the next several years, we have a robust research agenda that will continue to improve productivity, cost-performance, and security for enterprises engaging in generative AI activities across the development lifecycle.
We are excited about the opportunity ahead to accelerate enterprise generative AI activities across the world and enable AI value creators. If you’re interested in collaborating with us, we’d be delighted to hear from you.