Breaking down the AI transformer
A new open-source web-based tool designed at IBM Research and Georgia Tech lets you interactively explore the neural network architecture that started the modern AI boom.
A new open-source web-based tool designed at IBM Research and Georgia Tech lets you interactively explore the neural network architecture that started the modern AI boom.
It can be easy to mistake the fluent stream of text flowing from a large language model as magic. The point of Transformer Explainer is to show that it’s not. “The model is just learning how to make a probability distribution,” said IBM’s Benjamin Hoover.
Hoover is an AI engineer at IBM Research who co-designed the open-source and interactive Transformer Explainer with a team at Georgia Tech, where he’s also studying for a PhD in machine learning. The team’s goal was to give non-experts a hands-on introduction to what goes on under the hood of a transformer-based language model, which learns from large-scale data how to mimic human-generated text.
The tool integrates a live GPT-2 model that runs locally on the user’s web browser. Type a phrase into the chat window and watch the transformer’s components work together to predict the next word. Words are split into tokens which are converted to numerical vectors, then funneled through multiple transformer blocks until the model returns a ranked list of candidates for the next predicted word.
How deep you want to dive into the details depends on your level of interest. You can stay with the high-level overview or click into specifics like how the transformer computes an “attention” score for each word in your prompt. The model uses this score to determine which words are most important for choosing the next word to generate.
The tool was designed for non-experts, but techies have enthusiastically shared it on social media over the last week. “This is one of the coolest LLM Transformer visualization tools I’ve come across,” one AI influencer wrote on LinkedIn.
Hoover’s first project after joining IBM’s Visual AI team in 2019 was exBERT, a tool to help other researchers understand a new deep-learning architecture that was starting to gain traction: the transformer. Later that year, exBERT was runner-up for best demo at NeurIPS.
After exBERT, Hoover helped build RXNMapper, a tool that showed that transformers could learn the language of chemistry. Chemical and Engineering News ended up putting the visualization on its cover.
During the pandemic, Hoover decided to go back to school. He chose Georgia Tech because the school allowed him to continue working. For an adviser, he chose Polo Chau, a professor specializing in data visualization, because of his rave reviews from past students.
Transformer Explainer was born five months ago, after three of Chau’s students proposed designing an educational tool about transformers. Chau assigned Hoover to be their technical consultant.
In the five years since the release of exBERT, Hoover’s first attempt at explaining the transformer, many other data visualizations have appeared. “But they’re either too technical or too high level,” he said. “We wanted to find the right level of abstraction to reach the widest audience.”
The design the team settled on allows users to connect with the material in as much detail as they like by zooming in. Another key feature of Transformer Explainer is its sliding “temperature” scale that lets you adjust how creative you want the model to be in choosing its next word.
“A low temperature is the equivalent of picking the most likely prediction,” Hoover said. “If you want more creative answers, you turn up the temperature.”
The team used Svelte and D3 to design the visualizations on the front end, and the ONNX runtime and Hugging Face’s Transformers library to run GPT-2 in a browser.
Transformer Explainer will be presented at VIS 2024, IEEE’s annual data visualization conference in October, along with Diffusion Explainer, a similar tool aimed at explaining image-generating diffusion models. The team behind Transformer Explainer was led by Georgia Tech’s Aeree Cho, Grace Kim, and Alex Karpekov, with contributions from Alec Helbing, Jay Wang and Seongmin Lee, in addition to Hoover.
The team is currently looking into allowing users to interact with their own models in addition to GPT-2, which they chose for its size. “We wanted something that could run on a browser that students could pull up on their computer while the professor was teaching,” said Hoover. “GPT-2 was small enough to run on a CPU.”
As part of his PhD thesis, Hoover is working with IBM researcher Dmitry Krotov to develop a variation on the transformer, inspired by physical descriptions of how memory works. If that sounds confusing, not to worry. There will almost certainly be a visualization breaking it down for anyone to understand.