About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ISMB 2022
Poster
Scalable in-memory paradigm for genomics data processing
Abstract
Disk storage and access incur huge latency in processing of genomics datasets. To accelerate downstream data processing, the data needs to be closer to the processor and available in fast access memory devices. Historically, it was difficult to achieve this at larger scale due to cost and architectural constrains around dynamic random-access memory (DRAM) technology. There have been rapid improvements in memory technologies and computer architectures that allow cost effective solutions for processing large amount of data in significantly less time. In-memory paradigm takes advantage of the new architectural designs, where large memory pool can be created within a cluster or a cloud by means of distributed in-memory databases or directly by aggregating individual nodes together into a shared memory system. An in-memory paradigm can minimize the latency associated with the traditional HPC and Cloud based bioinformatics workflows, where tools are stitched in a sequential manner and output from one tool feeds as input for the next tool in the workflow, and at each stage of the workflow a significant amount of secondary disk based I/O is performed. We created this study to investigate if it is possible to utilize in-memory technologies, scalable in-memory databases, to accelerate genomics data processing.