Data Management
The future of computing lies in the hybrid cloud. We're creating a hybrid data fabric that provides secure, governed data access from anywhere, enables self-service discovery of the right data at the right time, and takes a holistic view at minimizing total cost of ownership for AI and analytics.
Our work
IBM’s text-to-SQL generator takes top place on a benchmark for handling complex database queries
NewsKim MartineauIBM’s CodeFlare significantly cuts the time to automate transfer learning tasks for foundation models
ResearchBishwaranjan Bhattacharjee, Raghu Ganti, Carlos Costa, Mudhakar Srivatsa, and Nick Fuller4 minute read
Publications
Preparing Good Data for Generative AI: Challenges and Approaches (Good-Data)
- David Vazquez
- Laure Berti-equille
- et al.
- 2025
- AAAI 2025
Question-guided Insights Generation for Automated Exploratory Data Analysis
- Abhijit Manatkar
- Akella Ashlesha
- et al.
- 2025
- AAAI 2025
Data Wrangling task automation using Code-Generating Language Models
- Akella Ashlesha
- Krishnasuri Narayanam
- 2025
- AAAI 2025
Vector Quantization with Sorting Transformation
- Hongzhi Wang
- Tanveer Syeda-Mahmood
- 2024
- Big Data 2024
PerSSD: Persistent, Shared, and Scalable Data with Node-Local Storage for Scientific Workflows in Cloud Infrastructure
- Paula Olaya
- Sophia Wen
- et al.
- 2024
- Big Data 2024
TabSketchFM: Sketch-based Tabular Representation Learning for Data Discovery over Data Lakes
- Aamod Khatiwada
- Harsha Kokel
- et al.
- 2024
- NeurIPS 2024
IBM Solution: Data Fabric
Our research is regularly developed into new features for Data Fabric in IBM Cloud Pak for Data.
Learn more
Projects
An LLM-Powered System for Enterprise Data
Reducing friction for scientific and foundation model workflows in Kubernetes.
A lineage data management system for tracking data in hybrid cloud deployments.