Data Management
The future of computing lies in the hybrid cloud. We're creating a hybrid data fabric that provides secure, governed data access from anywhere, enables self-service discovery of the right data at the right time, and takes a holistic view at minimizing total cost of ownership for AI and analytics.
Our work
IBM’s text-to-SQL generator takes top place on a benchmark for handling complex database queries
NewsKim MartineauIBM’s CodeFlare significantly cuts the time to automate transfer learning tasks for foundation models
ResearchBishwaranjan Bhattacharjee, Raghu Ganti, Carlos Costa, Mudhakar Srivatsa, and Nick Fuller4 minute read
Publications
Preparing Good Data for Generative AI: Challenges and Approaches (Good-Data)
- David Vazquez
- Laure Berti-equille
- et al.
- 2025
- AAAI 2025
Vector Quantization with Sorting Transformation
- Hongzhi Wang
- Tanveer Syeda-Mahmood
- 2024
- Big Data 2024
TensorLakeHouse: A High-Performance, Open-Source Platform for Accelerated Geospatial Data Management with Hierarchical Statistical Indices
- Romeo Kienzler
- Leonardo P. Tizzei
- et al.
- 2024
- AGU 2024
TabSketchFM: Sketch-based Tabular Representation Learning for Data Discovery over Data Lakes
- Aamod Khatiwada
- Harsha Kokel
- et al.
- 2024
- NeurIPS 2024
Consistency-based Black-box Uncertainty Quantification for Text-to-SQL
- Debarun Bhattacharjya
- Balaji Ganesan
- et al.
- 2024
- NeurIPS 2024
Adapting LLMs for Structured Natural Language API Integration
- Robin Chan
- Katya Mirylenka
- et al.
- 2024
- EMNLP 2024
IBM Solution: Data Fabric
Our research is regularly developed into new features for Data Fabric in IBM Cloud Pak for Data.
Learn more
Projects
An LLM-Powered System for Enterprise Data
Reducing friction for scientific and foundation model workflows in Kubernetes.
A lineage data management system for tracking data in hybrid cloud deployments.