IBM at Interspeech 2025
- Rotterdam, Netherlands
The Linux Foundation's AI_dev is a nexus for developers delving into the intricate realm of open source generative AI and machine learning. At the heart of this event is the belief that open source is the engine of innovation in AI. By uniting the brightest developers from around the world, we aim to ignite discussions, foster collaborations, and shape the trajectory of open source AI.
In this talk, we will introduce two open-source projects vLLM and KServe and explain how they can be integrated to leverage better performance and scalability for LLMs in production. The session will include a demo showcasing their integration. vLLM is a high-performance library specifically designed for LLM inference and serving, offering cutting-edge throughput and efficiency through techniques such as PagedAttention, continuous batching, and optimized CUDA kernels, making it ideal for production environments that demand fast, large-scale LLM serving. KServe is a Kubernetes-based platform designed for scalable model deployment. It provides robust features for managing AI models in production, including autoscaling, monitoring, and model versioning. By combining vLLM's inference optimizations with KServe's scalability, organizations can deploy LLMs effectively in production environments, ensuring fast, low-latency inference and seamless scaling across cloud platforms.
Speaker: Rafael Vasquez
Docling, an open source package, is rapidly becoming the de facto standard for document parsing and export in the Python community. Earning close to 30,000 GitHub in less than one year and now part of the Linux AI & Data Foundation. Docling is redefining document AI with its ease and speed of use. In this session, we’ll introduce Docling and its features, including how:
- Support for a wide array of formats—such as PDFs, DOCX, PPTX, HTML, images, and Markdown—and easy conversion to structured Markdown or JSON. - Advanced document understanding through capture of intricate page layouts, reading order, and table structures—ideal for complex analysis. - Integration of the DoclingDocument format with popular AI frameworks—such as LlamaIndex. LangChain, LlamaStack for retrieval-augmented generation (RAG) and QA applications. - Optical character recognition (OCR) support for scanned documents. - Support of Visual Language Models like SmolDocling created in collaboration with Hugging Face. - A user-friendly command line interface (CLI) and MCP connectors for developers. - How to use it as-a-service and at scale by deploy your own docling-serve.
Speakers: Michele Dolfi & Peter Staar
Generative AI is moving fast—and if you're responsible for deploying or tuning these models, you're probably feeling the heat. New LLMs, hardware, and training methods are landing constantly. How do you make sense of it all? How do you actually know what’s performant, what’s cost-effective, and what breaks the moment your stack changes?
In this talk, we’ll show you how we went from scattered, ad-hoc experiments to a fully structured, scalable benchmarking system capable of running tens of thousands of GenAI experiments—across models, hardware, and tuning techniques—with speed and repeatability.
We’ll break down how we built the stack: Ray for scale, Pydantic for schema rigor, MySQL to persist the chaos, and a CLI that feels like kubectl. We’ll show how we explore and optimize massive configuration spaces, visualize the results with Apache Superset, and use predictive models to skip the brute-force grind and get insights faster.
If you have to answer questions such as, “Can we serve this model without melting the budget?” or “Why did fine-tuning just fall over on the H100s again?”—this talk is for you.
Speaker: Michael Johnston
With the rapid rise of AI, developers need better ways to transform complex documents into structured data ready for model training and inference. Enter Docling, an open source Python package that's quickly becoming the go-to for document parsing and export. In just a few months, Docling has earned over 25,000 GitHub stars and is already reshaping how developers approach document AI.
In this session, you'll get an in-depth introduction to DocLing and how it can streamline your workflow, and get a chance to walk through a hands on workshop to create your first custom doc ingestion pipeline with Docling. Key features include:
Broad format support: Easily convert PDFs, DOCX, PPTX, HTML, images, and Markdown into structured Markdown or JSON.
Deep document understanding: Accurately capture page layouts, reading order, and tables—essential for complex document analysis.
AI integration: Use the DoclingDocument format with frameworks like LlamaIndex, LangChain, and InstructLab to power RAG, QA, and LLM training.
OCR support: Extract data from scanned or image-based documents.
Developer friendly CLI: Process documents quickly and consistently with a simple command-line interface.
Speakers: Peter Staar & Cesar Ramis
When faced with a difficult challenge sometimes it helps to look back at lessons from ancient history to guide your thinking. The Open Source Initiative (OSI) is working to create a definition for Open Source AI (OSAID), aiming to apply open source principles to artificial intelligence development, but clearly the 1.0 version is a work-in-progress. Can it find success? How may policy-makers react? Join this session to hear about the latest efforts to define open source AI and what's likely in store for 2025.
Speaker: Jeffrey Borek