Large-scale Cost-Aware Classification Using Feature Computational Dependency Graph
Abstract
With the rapid growth of real-time machine learning applications, the process of feature selection and model optimization requires to integrate with the constraints on computational budgets. A specific computational resource in this regard is the time needed for evaluating predictions on test instances. The joint optimization problem of prediction accuracy and prediction-time efficiency draws more and more attention in the data mining and machine learning communities. The runtime cost is dominated by the feature generation process that contains significantly redundant computations across different features that sharing the same computational component in practice. Eliminating such redundancies would obviously reduce the time costs in the feature generation process. Our previous Cost-aware classification using Feature computational dependencies heterogeneous Hypergraph (CAFH) model has achieved excellent performance on the effectiveness. In the big data era, the high dimensionality caused by the heterogeneous data sources leads to the difficulty in fitting the entire hypergraph into the main memory and the high computational cost during the optimization process. Simply partitioning the features into batches cannot give the optimal solution since it will lose some feature dependencies across the batches. To improve the high memory and computational costs in the CAFH model, we propose an equivalent Accelerated CAFH (ACAFH) model based on the lossless heterogeneous hypergraph decomposition. An efficient and effective nonconvex optimization algorithm based on the alternating direction method of multipliers (ADMM) is developed to optimize the ACAFH model. The time and space complexities of the optimization algorithm for the ACAFH model are three and one polynomial degrees less than our previous algorithm for the CAFH model, respectively. Extensive experiments demonstrate the proposed ACAFH model achieves competitive performance on the effectiveness and much better performance on the efficiency.