Deep Search

Overview

IBM Deep Search uses AI to collect, convert, curate, and ultimately search large document collections like public documents, such as patents and research papers. It makes information accessible that is too specific for common search tools to handle. It collects data from public, private, structured, and unstructured sources and leverages state-of-the-art AI methods to convert PDF documents into easily decipherable JSON format with a uniform schema that is ideal for today’s data scientists. It then applies dedicated natural language processing and computer vision machine-learning algorithms on these documents and ultimately creates searchable knowledge graphs.

IBM Deep Search has already allowed scientists and businesses to search mountains of unstructured data for a while. In 2022, our team made deep search even more versatile and accessible with the release of IBM Deep Search for Scientific Discovery (DS4SD), an open-source toolkit for scientific research and businesses.

You can try our demo here or find out more about IBM's research in accelerated discovery.

Publications

Robust PDF Document Conversion Using Recurrent Neural Networks
- - Nikolaos Livathinos
  - Cesar Berrospi
  - et al.
- 2021
- IAAI 2021
DCA++: A software framework to solve correlated electron problems with modern quantum cluster methods
- - Urs R. Hähner
  - Gonzalo Alvarez
  - et al.
- 2020
- Computer Physics Communications
Application of geocognitive technologies to basin & petroleum system analyses
- - Paolo Ruffo
  - Marco Piantanida
  - et al.
- 2019
- ADIP 2019
DFT+DMFT calculations of the complex band and tunneling behavior for the transition metal monoxides MnO, FeO, CoO, and NiO
- - Long Zhang
  - Peter Staar
  - et al.
- 2019
- Physical Review B
Corpus conversion service: A machine learning platform to ingest documents at scale
- - Peter Staar
  - M. Dolfi
  - et al.
- 2018
- KDD 2018
NanoStreams: Codesigned microservers for edge analytics in real time
- - Giorgis Georgakoudis
  - Charles Gillan
  - et al.
- 2016
- SAMOS 2016
Energy-efficient stochastic matrix function estimator for graph analytics on FPGA
- - Heiner Giefers
  - Peter Staar
  - et al.
- 2016
- FPL 2016
Stochastic Matrix-Function Estimators: Scalable Big-Data Kernels with High Performance
- - Peter Staar
  - Panagiotis Kl. Barkoutsos
  - et al.
- 2016
- IPDPS 2016

Resources

Blog Post

Docling: The missing document processing companion for generative AI

Red Hat13 Nov 2024

Blog Post

A new tool to unlock data from enterprise documents for generative AI

IBM Research Blog12 Nov 2024

Blog Post

AI is making extracting key information from reports easier than ever

IBM Research Blog22 Feb 2024

Blog Post

IBM Research’s open-source toolkit for Deep Search

IBM Research Blog11 Jul 2022

Presentation

Deep Search for Scientific Discovery

IBM Research07 Jul 2022

Blog Post

Deep Document Understanding: IBM’s AI extracts data from complex documents

IBM Research Blog15 Apr 2021

Contributors

PS

Peter Staar

Peter Staar

CA

Christoph Auer

Christoph Auer

CB

Cesar Berrospi Ramis

Cesar Berrospi Ramis

MD

Michele Dolfi

Michele Dolfi

KD

Kasper Dinkla

Kasper Dinkla

YK

Yusik Kim

Yusik Kim

VK

Viktor Kuropiatnyk

Viktor Kuropiatnyk

NL

Nikos Livathinos

Nikos Livathinos

ML

Maksym Lysak

Maksym Lysak

IM

Ingmar Meijer

Ingmar Meijer

AN

Ahmed Nassar

Ahmed Nassar

RT

Rafael Teixeira de Lima

Rafael Teixeira de Lima

PV

Panos Vagenas

Panos Vagenas