Technical note
5 minute read

Protecting user data with fully homomorphic encryption and confidential computing

blog-confidential computing.jpg

With the rapid growth of AI and big data analytics in hybrid cloud environments, data protection has become a critical concern for many companies. In recent years, IBM Research has been exploring ways to protect user data through a trustworthy, end-to-end pipeline utilizing various technologies. Among these, two major mechanisms stand out: Fully Homomorphic Encryption (FHE) and Confidential Computing (CC). Here, we provide a technical overview of these two technologies, how they compare, and explore their potential integration. 

Fully Homomorphic Encryption (FHE)

For many years, cryptographers focused on solving the challenges of encrypting data at rest and in transit. The challenge of protecting data (without distorting it) during computation was not considered as a cryptographic one. Recently, this thinking has changed. With Fully Homomorphic Encryption (FHE), one can encrypt the data and then perform calculations directly over the encrypted ciphertext.

The resulting ciphertext can then be decrypted at a secure location, and the plaintext data would be equivalent to results that would have been calculated in the clear. As a result, FHE theoretically allows for computation offloading without the fear of a malicious actor being privy to sensitive information. This enables the protection of both data and, in some cases, the analytical model that operates on it. With FHE, multiple parties can collaborate and gain insights from shared data and models — without each party being privy to the others' data.

Before we see widespread adoption of FHE, there are a few challenges that are currently still being tackled. The first hurdle is wide-audience accessibility. Widespread use of FHE depends on ease of use and frictionless adoption by developers and data scientists. Without an accessible solution, developers are required to be experts in cryptography, FHE and algorithmic optimizations. The second challenge is performance. FHE introduces overhead, in both performance and storage, that can increase dramatically when inefficiently used by a developer. IBM is working to solve both these challenges through HElayers.

HElayers is a software development kit designed by IBM Research to remove the burden of writing sophisticated algorithms and implementations over FHE — enabling application developers and data scientists to leverage the power of FHE without requiring any cryptography expertise. 

Confidential Computing (CC)

Confidential computing is designed to protect computations on untrusted compute infrastructures, such as machines deployed in a third-party data center. It specifically protects data-in-use, which refers to data loaded into the main memory at runtime. CC only places a minimal amount of trust in the processors, while considering the host software stacks (such as the host operating system or hypervisor) as potentially untrustworthy or under adversarial control. Major CPU vendors have progressively Notable technologies include AMD Secure Encrypted Virtualization (SEV), Intel Software Guard Extensions (SGX) and Trust Domain Extensions (TDX), IBM Secure Execution (SE) and Protected Execution Facility (PEF), ARM Confidential Compute Architecture (CCA), and RISC-V Confidential VM Extension (CoVE).integrated CC functionalities into their processors.

Despite differences in implementation or terminology, the fundamental security principles of these technologies are similar. Key principles include the introduction of new privileged execution modes; cryptographic isolation and access control to protected memory regions; and secure or measured launch of trusted firmware-software components. The different CC technologies offer varying levels of protection, ranging from designated memory regions within a process’ address space, to an entire virtual machine. Additionally, when combined with Kata Containers, CC can protect container workloads, known as Confidential Containers. This offers both the security of CC and the convenience of a cloud-native execution model.

Recently, Nvidia introduced CC into its H100 GPU, allowing cooperation with CPU CC to extend the trust boundary from the CPU to the GPU. Data can be transmitted through encrypted I/O paths between the CPU and GPU. To further enhance I/O performance for confidential computing, trusted I/O technology is actively being developed. It builds upon multiple industry standards, including the TEE Device Interface Security Protocol (TDISP), Integrity and Data Encryption (IDE), and Secure Protocol and Data Model (SPDM). These efforts aim to support faster and more secure I/O between confidential virtual machines and CC-aware physical devices.

Confidential computing can be applied across a wide range of privacy-sensitive applications. One of the most prominent use cases today is in protecting AI computation. Data and models in AI computations are valuable assets containing sensitive information, and often personal user data. Establishing AI services, including inference, training, and fine-tuning, requires significant computing resources that may not be affordable for individual data owners. As a result, many companies often look to deploy their AI computations on large third-party compute providers, and CC can help protect data and models from being exposed at runtime. 

In recent years, IBM Research has been addressing research challenges associated with the adoption of CC in various AI applications, including inference,1 collaborative training,2 and federated learning.3 With InfEnclave1, we investigated how to securely partition deep neural network models for confidential inference services, addressing the memory capacity limits of early Intel SGX. In CalTrain,2 we tackled the conflicting goals of data confidentiality and model accountability in collaborative learning. We proposed a CC-based approach to preserve data confidentiality during training while fingerprinting training instances for post-hoc model debugging and forensic analysis. In DeTA,3 we intended to address the problem that a malicious actor controlling the federated learning’s aggregation can reconstruct the training data from the model updates from the training participants. Our key idea is to decentralize aggregation with multiple CC-protected aggregators and enable model partitioning and shuffling to mitigate data leaks.  

Beyond applying CC to AI, another line of our research focuses on the security analysis of CC infrastructure. This includes comprehensive security analyses of Intel TDX4 and identifying potential attacks in Confidential Containers.5 

Complementary Solutions

At first glance, it is easy to view different privacy-preserving techniques as competing technologies. However, in recent years, it has become apparent that hybrid solutions that leverage the best of all worlds are often more effective. Viewing these solutions as tools in a bigger toolbox allows us to mix and match, rather than holding a single hammer and seeing every problem as a nail.

With FHE and CC, one can avoid side-channel leakage of sensitive information by using FHE in a secure enclave, while at the same time leveraging CC security to ensure integrity of the data and processing. In addition, the workload owners can also flexibly partition their computations based on different security and performance requirements, using different technologies to satisfy their practical needs.  

Date

Notes

  1. Note 1Notable technologies include AMD Secure Encrypted Virtualization (SEV), Intel Software Guard Extensions (SGX) and Trust Domain Extensions (TDX), IBM Secure Execution (SE) and Protected Execution Facility (PEF), ARM Confidential Compute Architecture (CCA), and RISC-V Confidential VM Extension (CoVE). ↩︎

References

  1. Gu, Z., Huang, H., Zhang, J., Su, D., Jamjoom, H., Lamba, A., Pendarakis, D. and Molloy, I., 2018. Confidential inference via ternary model partitioning. arXiv preprint arXiv:1807.00969 2

  2. Gu, Z., Jamjoom, H., Su, D., Huang, H., Zhang, J., Ma, T., Pendarakis, D. and Molloy, I., 2019, June. Reaching data confidentiality and model accountability on the caltrain. In Proceedings of the 49th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) (pp. 336-348) 2

  3. Cheng, P.C., Eykholt, K., Gu, Z., Jamjoom, H., Jayaram, K.R., Valdez, E. and Verma, A., 2024, April. DeTA: Minimizing Data Leaks in Federated Learning via Decentralized and Trustworthy Aggregation. In Proceedings of the Nineteenth European Conference on Computer Systems (Eurosys) (pp. 219-235).  2

  4. Cheng, P.C., Ozga, W., Valdez, E., Ahmed, S., Gu, Z., Jamjoom, H., Franke, H. and Bottomley, J., 2024. Intel tdx demystified: A top-down approach. ACM Computing Surveys, 56(9), pp.1-33. 

  5. Valdez, E., Ahmed, S., Gu, Z., Dinechin, C., Cheng, P.C., Jamjoom, H., 2024. Crossing Shifted Moats: Replacing Old Bridges with New Tunnels to Confidential Containers. To appear in Proceedings of the 31st ACM Conference on Computer and Communications Security (CCS).