What is Needed to Run Kubernetes in Production?
We refer to a Kubernetes environment as “production-ready” when it has everything needed to serve traffic to real end users.
There are many requirements to make Kubernetes production-ready. In addition to the built-in disaster recovery capabilities, it must be secure, scalable, highly available and reliable, and must provide logging and monitoring capabilities that meet organizational requirements. In an enterprise environment, Kubernetes must also support the demands of governance and compliance standards.
In this article, you will learn:
Kubernetes in Production Stats
According to the 2020 CNCF Survey, 92% of organizations surveyed now use containers in production, and 83% use Kubernetes in product (up from 84% and 78% respectively just a year ago).
17% of organizations reported they operate over 5,000 machines (including bare metal, VMs and cloud instances), and 23% reported they have over 5,000 containers, indicating that a large part of the market is dominated by enterprise deployments. 12% of organizations have over 50 Kubernetes clusters in production.
The biggest challenges reported by organizations in the CNCF survey are complexity (41%) and adapting culture to cloud native technology (41%). The next biggest challenge is security, experienced by 32% of organizations.
Considerations for Running Kubernetes in Production
Barebone Kubernetes is not sufficient for real production applications. Below we describe key services required to make Kubernetes suitable for production environments.
Cluster Monitoring and Logging
When running in production, Kubernetes can typically scale to hundreds of pods. Without effective monitoring and recording, downtime can cause serious, irreversible errors that can affect customer and business satisfaction.
Monitoring enables visibility and detailed metrics for your Kubernetes infrastructure. It can provide metrics on the use and performance of private cloud or public cloud provider resources, from individual containers or VMs to servers, networking performance, and storage usage.
Centralized log management is an important feature, which you can implement using proprietary or open source tools such as the EFK stack—Fluentd, FluentBit, Elasticsearch and Kibana.
Not only should Kubernetes be monitored 24/7, but should also provide a role-based dashboard with active alerts with key performance indicators such as performance, capacity management, and production issues.
Reserved Compute Resources for System Daemons
Always reserve resources for system daemons, which both Kubernetes and the underlying operating system require. The system daemon utilizes CPU, memory, and temporary storage resources, and all three should be reserved in adequate quantities. Reserved resources are not counted as part of node capacity, but are exposed as allocable resources.
You can use these kubelet flags to reserve resources for system daemons:
--kube-reserved—reserves resources for Kubernetes system daemons, such as the kubelet, container runtime environments, and problem detectors.
--system-reserved—reserves resources for operating system daemons such as sshd and udev.
Heartbeat and Election Timeout Intervals for etcd Members
When configuring an etcd cluster, it is important to specify the heartbeat correctly and choose timeout settings:
- Heartbeat interval is the frequency with which the etcd leader informs followers that it is alive. A recommended value is the round trip time between etcd members.
- Timeout interval is the period of time a follower waits for a heartbeat before becoming a leader. A recommended value is ten times the round-trip time between members.
Regular etcd Backups
Since etcd stores the state of the cluster, it is recommended that you regularly backup etcd data, and store backups on another host.
Here are three ways to backup etcd clusters:
- Taking a snapshot using the etcdctl snapshot save command
- Directly copying the data from the member/snap/db directory
- Taking a snapshot of the storage volume using tools provided by your cloud provider (if running etcd on a public cloud)
Best Practices for Kubernetes in Production
Every environment has unique characteristics and requirements. Still, there are certain standards that apply (or should be applied) to the majority of production deployments. The standards below, if applied correctly, can help you achieve high availability, scalability, and security.
Environments of Kubernetes in production should be built with high availability and disaster recovery. A good start is to run at least five master nodes.
Some other best practices for high availability in production:
- In a production environment, the master node must be configured in odd numbers. Replicating master nodes and worker nodes between cloud provider availability zones enables clusters to tolerate AZ disruption.
- Put a copy of etcd on a dedicated node. This can avoid downtime due to insufficient member resources. You should regularly backup Etcd data must, as it maintains cluster health. These copies will serve you well when you need to initiate disaster recovery.
- Use active-passive replication when replicating the controller manager and the scheduler.
Kubernetes has three built-in tools to support scalability. You should use all three in production, to fine-tune scaling behavior.
- Cluster Autoscaler—adds or removes nodes to increase or decrease cluster size.
- Horizontal Pod Autoscaler—uses pre-defined metrics to scale pods in a ReplicaSet or deployment.
- Vertical Autoscaler—uses usage metrics to define resource requests and limits for each individual container.
When you use all three autoscooling processes, you can ensure a scalable Kubernetes environment in production.
It is important to specify resource requirements and limits for each container. Additionally, you should split your Kubernetes environment and create separate namespaces for different teams, departments, customers, or applications. After creating a namespace, you should:
- Use LimitRange to specify minimum and maximum resource requests, default requests, and request limits
- Use Resource Quota, Pod and API Quotas to cap the allowed resource requests from all containers, and the total number of Kubernetes objects that can be provisioned in the namespace
Kubernetes provides a set of security controls, as well as the CIS compliance benchmarks, which specify how to define these controls correctly.
However, when moving into large-scale deployments in production, these security controls and compliance benchmarks are not enough. Moreover, the security defaults in Kubernetes are either loose or not well defined. Enterprise deployments are at constant risk of Kubernetes components being deployed to production, or exposed by mistake, with default security settings.
To resolve these problems and make Kubernetes suitable for an enterprise production environment, organizations need:
- Dynamic risk assessment, which can constantly identify the most vulnerable deployments and nodes – this can be achieved with a new category of tools called Kubernetes Security Posture Management (KSPM)
- Automated security remediation of Kubernetes components based on risk
- Policy-driven security controls that leverage native K8s capabilities
- Better assurance that container images are free from security vulnerabilities
- Better network security controls
Securing Kubernetes with Aqua
Aqua tames the complexity of Kubernetes security with KSPM (Kubernetes Security Posture Management) and advanced agentless Kubernetes Runtime Protection.
Aqua provides Kubernetes-native capabilities to achieve policy-driven, full-lifecycle protection and compliance for K8s applications:
- Kubernetes Security Posture Management (KSPM) – a holistic view of the security posture of your Kubernetes infrastructure for accurate reporting and remediation. Helping you identify and remediate security risks.
- Automate Kubernetes security configuration and compliance – identify and remediate risks through security assessments and automated compliance monitoring. Help you enforce policy-driven security monitoring and governance.
- Control pod deployment based on K8s risk – determine admission of workloads across the cluster based on pod, node, and cluster attributes. Enable contextual reduction of risk with out-of-the-box best practices and custom Open Policy Agent (OPA) rules.
- Protect entire clusters with agentless runtime security – runtime protection for Kubernetes workloads with no need for host OS access, for easy, seamless deployment in managed or restricted K8s environments.
- Open Source Kubernetes Security – Aqua provides the most popular open source tools for securing Kubernetes, including Kube-Bench, which assesses Kubernetes clusters against 100+ tests of the CIS Benchmark, and Kube-Hunter, which performs penetration tests using dozens of known attack vectors.
Learn More About Kubernetes in Production
There’s a lot more to learn about Kubernetes in production. To continue your research, take a look at the rest of our blogs on this topic:
Creating and Securing an EKS Cluster: First Steps
Amazon Elastic Kubernetes Service (Amazon EKS) is a managed service that helps you simplify the deployment and management of your Kubernetes operations. However, while EKS takes care of the backend work, organizations are still required to set up their workloads and networking. This article walks you through the process of creating and securing an EKS cluster.
Protecting Kubernetes Secrets: A Practical Guide
Containers need sensitive data, such as passwords and authentication keys to perform operations. However, to ensure this data is secure, it is kept separately in a mechanism called “secrets”, which are stored in Base64 encoding. However, while secrets are created for security purposes, these mechanisms are not inherently secure. This article explains key challenges and solutions organizations can leverage to protect Kubernetes secrets.
Introducing KSPM by Aqua: Kubernetes Security Posture Management
Since its initial introduction, Kubernetes has skyrocketed in popularity. However, while Kubernetes helps organizations better apply containerization to their workloads, the system is highly complex, and does not provide adequate controls to manage and secure deployments of tens or hundreds of thousands of containers. This article explains how Aqua’s Kubernetes Security Posture Management (KSPM) solution solves critical Kubernetes security challenges.
Kubernetes Security Best Practices: 10 Essential Steps to Securing Kubernetes
When setting up a Kubernetes operation, every single configuration requires careful consideration as well as customization to the unique needs of your project. However, there are certain best practices that are critical to protect any workload, including setting up RBAC, using third-party authentication for API Server, protecting ETCD with TLS and firewall, and more. This article explains 10 best practices you can use to secure Kubernetes deployments.