Deep Dive
8 minute read

How the IBM Research AI Hardware Center is building tomorrow’s processors

A look inside the group developing full-stack solutions to AI’s unique computational requirements.

In the earliest days of the IBM Research AI Hardware Center, before ChatGPT broadened the public awareness of generative AI, IBM executives had to explain to businesses why they should care about AI. Now the question tends to be how quickly IBM can help companies deploy it.

But the proliferation and popularization of AI has created an issue: The legacy enterprise hardware and software stack can’t keep up with the enormous demands of AI workloads, and as a result struggles to meet the growing demand. CPUs and GPUs, well-suited to their intended tasks, show their speed and efficiency limitations when deployed to train and inference large language models.

To address these growing challenges, the AI Hardware Center launched in February 2019, with large initial investments from IBM, SUNY Polytechnic Institute, and the state of New York. Founding members included Samsung, Synopsys, Applied Materials, and Tokyo Electron Limited (TEL). Academic institutions like the University of Albany are also crucial partners, and a sizable portion of the AI Hardware Center's work happens at NY CREATES's Albany NanoTech Complex, a public-private collaboration whose academic partners include the SUNY system and the SUNY Research Foundation. The Center’s goal is to develop the next generation of chips, systems, and software to support the future of AI.

Even before the LLM explosion, IBM Research recognized the limitations of conventional CPUs and GPUs, and was looking for ways to make AI-specific hardware. And in April of this year, IBM announced that the Telum II processor’s onboard AI accelerator core and the separate Spyre Accelerator will be available in IBM z17 this year — both fruits of this group’s efforts. Spyre will also be available in IBM Power11 this year. Not content to just produce a chip, the IBM Research staff driving the AI Hardware Center are devising full-stack solutions for our burgeoning AI-driven world, including software that can extract the full potential from new hardware.

A pair of gloved hands manipulate a chip that is labeled "IBM Research AI Hardware Center." On the work surface beneath the chip is a circuit board with various wires, fans, and electronics components attached to it.
One of the AI Hardware Center's projects includes analog AI chips, where memory and compute are tightly intertwined, rather than stored in two separate locations.

In service of that goal, the AI Hardware Center is excited to offer access to its leading-edge research hardware to partners who want to help refine the software stack, as well as members who just want to develop AI chips, said Center program director John Rozen. “It’s a rare opportunity for researchers to be so close to the product,” he added.

Processors are their most apparent product, but the AI Hardware Center’s AI accelerator project also includes research and development on experimental analog in-memory computing chips. Behind the hardware sits the software to make it all seamlessly plug and play for developers and engineers, rounding out the AI accelerator project. Its other two core projects are the AI technology testbed, which enables developers to access IBM’s AI hardware through the cloud and even modify algorithms or open-source software, and the heterogeneous integration project, which is working on high-speed, high-bandwidth interconnects to tie AI accelerators together with memory and processing.

In the beginning

And while completeness is what they deliver today, the Center started out with just a kernel of an idea: using approximate computing for deep learning. Jeff Burns, now the AI Hardware Center’s director, started talking with colleagues about this idea back in 2015, years before the transformer revolution and the Center’s founding. They discussed what it would look like to take advantage of the efficiencies of low-precision arithmetic to train neural networks. What began as conversations in the halls of the Thomas J. Watson Research Center quickly yielded a white paper on the topic, which opened the door to funding and a research hypothesis: Designing low-precision hardware from scratch would yield greater performance and higher efficiency for deep learning than trying to get the available CPUs and GPUs to do approximate computing.

One of IBM Research’s groundbreaking publications on the topic, which came out in 2015, proved the feasibility of performing deep learning in 16-bit precision with little to no degradation in accuracy. A 2018 paper followed up on that work by showing it was possible to train a deep neural network in 8-bit precision, and subsequent experiments showed that strong performance in deep learning inference could be achieved in 4-bit and even 2-bit precision.

This was pre-ChatGPT, but IBM Research could already see the change coming in workloads for compute, and wanted to make sure IBM was in the driver’s seat for the ensuing journey — thinking through the impacts of AI at the level of silicon, chips, and applications.

At the same time as AI was starting to take off, semiconductor scaling was slowing down, so Burns and others agreed that IBM Research should aim fundamental technology work toward AI. Other groups within Research were working along the same lines, developing hardware and the accompanying software to accelerate AI inference, before the AI Hardware Center had coalesced. The center may have launched in 2019, but as with any origin story, it began earlier. For almost a decade, IBM Research has been working on phase-change memory devices for analog in-memory computing, which has become one of the Center’s research tracks.

“This motivated us to put it all under one umbrella so we could work in a more cohesive, consistent way,” Burns recalled. It also led to the Center’s technical tracks crystallizing: analog and digital AI accelerator cores, heterogeneous integration, and the AI technology testbed. Even six years in, these tracks remain intact. “We at the AI Hardware Center believe in this being the right way to attack the problem,” said Burns. “Fortunately, IBM Research has experts across the full-stack solution, as well as the unbelievable capability to comprehend the whole scope.”

250310_Shot16_Select_F_IO_Drawer_Lighting_IK_v032_Cam_01_0005_V2_FINAL_2160x1215.jpg
IBM Spyre Accelerators, shown here mounted on PCIe cards, will be available this year for IBM z17. This chip will enable businesses to run generative AI models and agentic AI on premises, for tasks including financial fraud detection.

Full-stack solutions

Designing AI hardware is about building something that can be used, said Leland Chang, co-principal investigator of the AI Hardware Center’s digital track in the AI accelerators project. “We hardware people like to focus on the shiny piece of hardware, but the way you enable it and optimize the software for it is absolutely critical, sometimes more important than the hardware itself.”

Now that Spyre is out in the world, the AI Hardware Center is focusing intently on architecture, algorithms, and design, said Vijay Narayanan, co-PI of the analog track in the AI accelerators project. “We've already built hardware and chips, and now we're trying to figure out things like medium-sized models which are even more energy efficient, or how to deliver them,” he said. “The goal posts have moved quite a bit, both in the industry and in the Center.”

In the years required to develop a new chip, multiple new AI models rise and fall. Once an accelerator like Spyre is built, its design can’t be changed, so its low-level software components, like the compiler, depend on the full software stack to unlock the chip’s capacity and to provide flexibility for the future.

“Even as you anticipate the runway while you’re building a new chip, there are so many things that compress your runway,” said distinguished research scientist Viji Srinivasan, who works on AI accelerator architecture. “In that compressed runway, we are betting on some things being more important than others, and we’re putting in the effort to enable and support those.” To strike the right balance, Srinivasan and her colleagues meet regularly with other teams within IBM and with the external AI Hardware Center partners to make sure they’re on the right track.

For example, how specialized or general should an AI accelerator be? Transformer architectures form the backbone of LLMs, but building hardware that exclusively supports the needs of transformers can scupper its ability to handle emerging hybrid models, like those paired with state space models. That capacity to handle both — and whatever is coming down the pike, including the next version of Granite — has to be built into the hardware and the software.

The field is maturing in parallel, too. The companies asking for AI products are starting to pivot to the things that IBM’s technologies, and Spyre in particular, do well. “People are starting to realize that small, purpose-built models are important, and now you have to worry about the data prep pipeline,” said Edward Barth, who worked in business development at the Center. “The consciousness of the marketplace is catching up to where we already are.”

And it’s already resulting in new workstreams. “The best feedback we can receive from our partners is for them to want to drive a joint agenda,” Rozen said. “We have folks from competing companies sitting at the table to learn, but the ideal outcome is for these members to really get involved.”

20230607_IBM_TP_Albany_4123_FINAL_2520x1418.jpg
A technician works in a clean room at Albany NanoTech, where sensitive equipment, like this EUV machine, etches silicon wafers to make chips.

An internal partnership

As the company’s organic growth engine, IBM Research partners with other IBM business units, too, developing today’s findings into tomorrow’s products. The AI Hardware Center started working with the IBM Infrastructure team in 2019 to integrate a single AI accelerator core into z16’s Telum processor, which debuted in 2022. Srinivasan and team took what they learned from working on Telum and incorporated it into Spyre. “That collaboration continues with Spyre in z17, and there is a tight feedback loop between the product and research divisions,” she said. “At IBM, we’re eating our own cooking, and using its own accelerator first is clear proof that the company is serious about these ideas.”

One of the main threads within the Center currently is software enablement, building tools where data scientists can log on and use an AI accelerator like they would any other computer. This makes the theoretical work come to life, so the AI Hardware Center is letting partners run on its hardware to get that firsthand experience and demonstrate the value of AI hardware — something that wasn’t possible when the group only had experimental cores. Today, IBM Research has servers with Spyre chips in them, remotely accessible by AI Hardware Center members.

IBM Quantum did something similar when it made its earliest prototype devices available on the cloud back when the systems only had a few qubits. The AI Hardware Center’s leaders are optimistic that making Spyre and other chips available will help industry partners grasp the capabilities of AI hardware, creating a flywheel for further innovation.

z17_Spyre_2160x1215 (1).jpg
The first versions of Telum (left) and Spyre (right) debuted in 2022 in IBM z16, developed in close collaboration with the IBM Infrastructure team.

What’s next

The AI Hardware Center’s first system-on-a-chip is IBM Spyre, which will be available in IBM z17 later this year. They’re already hard at work on the next generation of digital AI accelerator, and are eager to get it completed, fabricated, and out into the world.

The roadmap includes much more than this one chip, though.

Chiplet-based design, too, is on the road ahead. The Center’s heterogeneous integration project exploits chip packaging technology to get the most out of the hardware. Experts at the AI Hardware Center are doing a ton of fundamental research in this area, and in a future generation of accelerators, those techniques may feature.

The analog in-memory compute effort, too, is nearing maturity, said Narayanan, who emphasized that this technology is about so much more than the sophisticated phase-change memory hardware. “Even if only the fabric of our analog in-memory compute work — the architecture, the algorithm — make it to product with a partner’s novel memory product, that to me would be success,” he said.

More broadly, AI workloads continue to evolve, with reinforcement learning and fine-tuning blurring the boundary between inference and training, as are newer concepts like chain-of-thought reasoning, inference scaling, and agentic workflows. The AI Hardware Center’s work will continue to account for these branching paths.

With Spyre, AI Hardware Center staff tend to agree they successfully anticipated AI workloads and designed a chip for what’s ahead. “That can be hard to do, because you don’t want to be wrong, but every time something changed, we pivoted and tried something different,” said Chang. “But we’re okay with being wrong, as long as it gets us to the right place.”

Related posts