Coarse-grained information flow control on hybrid clouds
Abstract
Recently, more and more enterprises have adopted hybrid cloud strategies to simultaneously enjoy the security of on-premise clouds and the low cost of public clouds. The key challenge of hybrid clouds, though, stems from the difficulty of specifying where the data should be stored and where the information could flow efficiently. In order to meet security concerns and performance requirements, we introduce a coarse-grained information flow control (CIFC) model to limit storing, accessing, and disclosing of confidential data in public clouds. The CIFC model aims at providing information control implicitly, without the large overhead of periodically checking access privileges. Moreover, since the CIFC model may request redistributing data whenever the secrecy level of a dataset changes, we formulate the data redistribution problem as an optimization problem and propose the Partition Biased Sampling Algorithm (PBSA) for its solution. We implemented the CIFC model on top of Spark, and our results show that Spark applications can achieve 1.4 to 2.1 times better performance by utilizing the additional computational capacity of public cloud to process non-sensitive data. Furthermore, we integrate the PBSA algorithm into Spark and demonstrate a saving of more than 35% in execution time, compared to the Spark default data distribution strategy.