< Back

Kevin Hui

Kevin Hui

Software Engineer, Meta

Kevin's session

Securing Services at Meta with CVM Lift and Shift

June 5, 4:20 PM - 4:40 PM
Grand Ballroom Salon B

Over the past year, Meta has significantly increased its investments in TEE technologies, with a focus on AMD SEV-SNP Confidential VMs. We will discuss how Meta leverages CVMs to secure services, particularly the infrastructure for Lift and Shift services.

The primary class of use cases we've addressed is defense in depth. Some high-criticality services desire confidentiality and integrity guarantees provided by CVMs. For example, a Key Management Service desires higher security posterity for its keys stored in main memory. CVMs offer strong security guarantees with cheaper hardware, making them an attractive solution to secure Meta's vast number of services deployed across a global fleet.

Attestation Infra

Consider interactions where attestation is done at the application level. Attesters generate evidence, verifiers validate evidence, and encrypted channels must be established. Both attester and verifier services would need to integrate TEE-specific attestation software, which violates Lift and Shift. Even if these dependencies were abstracted into a library, it must support multiple programming languages and TEE technologies. We thought such a library would be difficult to maintain.

Instead of doing explicit remote attestation at the application layer, our solution was to do implicit remote attestation at the transport layer. CVM Services use X509 certificates that encode attested identity – which is something all Meta services can understand because we already encode regular service identities this way. We refer to this concept as Implicit RA-TLS via Attested Service Identities, which is an implementation of the RATS Passport model.

The major steps are:

1. An agent in the CVM requests a cert from a special Certificate Authority by providing attestation evidence binded with CSR.

2. The CA appraises the evidence according to some policy and mints (e.g. signs) a cert representing the Attested Service Identity.

3. The CVM Service uses this cert to interact with other entities in Meta infra. The relying party does not need to perform explicit remote attestation because it trusts the CA has attested the peer.

The main benefits of this attestation model:

Reusing TLS for establishing encrypted channels. Instead of redoing it at the application layer. The protocol ensures the X509 private key is owned by the CVM.

We reuse the concept of identities in X509 certs to allow for seamless integration with Meta's Auth framework to authenticate services (plus their attested state) and authorize access to resources.

Application code can remain agnostic to the runtime environment (e.g CVM, Container, bare metal). Service owners do not need deep technical knowledge and effort to use CVMs.

Trade-Offs

While CVMs allow for the convenience of Lift and Shift, they trade off for increased TCB. The inclusion of the kernel, OS, and entire service applications bloat the TCB, which increases attack surface area. This property is also unsuitable for use cases that care to minimize TCB.

The current deployment model requires service owners to build entire VM images, which adds significant overhead compared to containers, Meta's standard unit of deployment.

There is a noticeable latency overhead for IO-bound workloads. We may present preliminary performance benchmarks for early services that we have onboarded.

Future Work

We are exploring a deployment model that runs critical code inside a small CVM (e.g., a sidecar) alongside the main process. This will allow for relevant use cases to minimize TCB.

We are also exploring a deployment model similar to Redhat's Confidential Containers, but instead of Kubernetes, it's Meta's internal container orchestration. The main idea is for a base VM image containing the management layer that stages the container layer. This would alleviate service owners from building entire VM Images.