A newly discovered vulnerability in the container runtime tool CRI-O could allow for attackers who are able to create pods in a Kubernetes or OpenShift cluster that uses the software, to break out to the underlying cluster node, effectively escalating their privileges. While, as ever, the best way to address this issue is to apply the appropriate security patch, some other measures can be taken to mitigate attacks that exploit this until patches can be deployed.
The original write-up on this issue provides a reproducible proof of concept that shows how it can be exploited. Additionally, researchers have shown how this issue can be used to fully exploit a vulnerable cluster node.
Essentially, the vulnerability comes down to how parameters passed to the sysctls section of a Kubernetes manifest are passed to a supporting utility without sufficient validation. This lack of validation allows an attacker to modify other system parameters in a way that allows for container breakout.
Using admission control as a mitigation
From the vulnerability write-up, we can see that the way the issue is triggered requires the attacker to set custom sysctls values in a manifest.
This opens a couple of possible mitigation strategies at the Kubernetes level. If the workloads running in your cluster don’t need to set custom sysctls, then blocking any attempt to set them would block the exploitation vector. If custom sysctls are required, an alternative (but more complex and possibly brittle) approach would be to try and block the key characters (+ and =) in the specific field.
The first option is probably the easiest to implement as a quick fix while patches are being deployed, as it only requires limited or no custom policy writing. However, care should be taken to ensure that no workloads currently deployed in the cluster make use of this feature. Looking at popular open source admission controllers, we can see that there are already policies that look at this area of manifests.
For OPA, in their gatekeeper-library repository there’s a constraint template that can be used to block the setting of custom sysctls. Applying this template and then creating a constraint that blocks all custom sysctls will provide us with protection:
Trying to apply the sample manifest from the exploit blog will now return the following result:
Another popular option for Kubernetes admission control is Kyverno. Looking at their policy library, we can see there is also a sample policy to block sysctls that are not on a whitelist. This isn’t quite what we need, since we’re looking to block all sysctls, but modifying it is straightforward. Just change pattern section to look like this:
and it should match on all use of custom sysctls. To block the validationFailureAction should be changed to enforce as well. With that policy applied either at the cluster or specific namespace level, if we try to apply the exploit manifest, we will receive an error indicating that it has been blocked by our policy:
This vulnerability is one of a few recent issues that could allow for container breakout. In common with some of the others, there are layers of protection that can be used to reduce the likelihood of exploitation until the relevant security patches can be applied.
With complex environments like Kubernetes and OpenShift, it’s important to have layered defences and to understand where they can be used to apply mitigating controls. This will help reduce the necessity for emergency deployment of security patches and give operational teams the time for testing to ensure they won’t impact the operation of the systems.