About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICDCS 2018
Conference paper
Benchmarking deep learning frameworks: Design considerations, metrics and beyond
Abstract
With increasing number of open-source deep learning (DL) software tools made available, benchmarking DL software frameworks and systems is in high demand. This paper presents design considerations, metrics and challenges towards developing an effective benchmark for DL software frameworks and illustrate our observations through a comparative study of three popular DL frameworks: TensorFlow, Caffe, and Torch. First, we show that these deep learning frameworks are optimized with their default configurations settings. However, the default configuration optimized on one specific dataset may not work effectively for other datasets with respect to runtime performance and learning accuracy. Second, the default configuration optimized on a dataset by one DL framework does not work well for another DL framework on the same dataset. Third, experiments show that different DL frameworks exhibit different levels of robustness against adversarial examples. Through this study, we envision that unlike traditional performance-driven benchmarks, benchmarking deep learning software frameworks should take into account of both runtime and accuracy and their latent interaction with hyper-parameters and data-dependent configurations of DL frameworks.