Understanding Container Monitoring: Best Practices and Tools

What is Container Monitoring?

Because containers are ephemeral in nature, they are difficult to monitor, compared to bare-metal server-based applications or even those running within a virtualized server. However, monitoring is critical to ensure availability, performance, and security for containerized workloads. Docker container infrastructure requires new monitoring strategies, technologies, and tools.

In this article, you will learn:

Why Container Observability Matters
Challenges with Container Monitoring
How Can You Monitor Containers Effectively?
What are the Common Features of Container Monitoring Tools?
Top Container Monitoring Tools

Why Container Observability Matters

Visibility and monitoring are essential to maintain a smoothly running environment and to optimize resource usage and costs. All the more so in containerized environments, which are highly dynamic and require close monitoring to maintain application health.

Because each container image can have a large number of running instances, and due to the high pace at which new images and versions are introduced, problems can easily spread through containers, applications, and the entire architecture. This makes it critical to pinpoint the root cause of a problem as soon as it occurs.

It is critical to achieve observability of at least the following components in a containerized environment:

Host servers
Container runtime
Orchestrator control plane
Containerized middleware
Applications running within containers

Related content – learn more in our detailed guides about:

Docker architecture
Kubernetes architecture

In large scale containerized environments, this is only possible through dedicated cloud native monitoring tools, and must be automated.

Failure to achieve observability can result in:

Poor visibility and operational challenges—without observability, it is difficult for developers and operations tasks to understand what is running and how it is performing. Maintaining applications, meeting SLAs and troubleshooting can become very challenging.
Scalability challenges—the ability to rapidly scale out applications or microservice instances on demand is an important requirement for containerized environments. However, observability is the only way to gauge demand and user experience. Scaling up too late can result in poor performance, scalability issues, and outages; scaling down too late results in wasted resources and money.

Related content: read our guide to Docker in production

Challenges with Container Monitoring

Container monitoring must overcome some unique challenges, for which traditional virtual infrastructure monitoring systems are inadequate:

Containers are ephemeral—provisioning and destroying a container is a swift process. While this is one of their major advantages, it also raises difficulties when tracking changes in complex, high-churn systems.
Containers share resources—monitoring the physical host’s resource consumption for a containerized environment is difficult, since hosts share memory, CPUs and other resource consumption. Consequently, obtaining indications of application health and container performance become problematic.
Traditional tooling is incompatible—traditional monitoring solutions often lack the tools required for the metrics, traces, and logs required to monitor and troubleshoot a virtualized environment, more so the health and performance of containers.

How Can You Monitor Containers Effectively?

Here are a few best practices that can help you improve monitoring of your containerized infrastructure.

Alerting Across the Delivery Pipeline

Each stage of the delivery pipeline requires monitoring. Often, issues discovered during early stages of the delivery cycle have a significant impact on applications. Hence, monitoring the entire DevOps toolchain is important, and state changes may indicate critical application issues.

Monitor the Entire Stack

Monitoring must apply to the complete stack to afford full application visibility. Ensure that monitoring covers containers, clusters that run those containers (Kubernetes, Swarm, or others), inter-container communications and telemetry (using contracts, Istio logs, etc), control plane machines, and servers running worker nodes.

Visualize the Environment

To maintain a containerized infrastructure, you’ll need insights into system health at multiple levels of granularity. By visualizing the environment, you can identify where problems lie and drill down to assess issues in clusters, nodes, pods, and containers.

Add Context to Alerts

The server upon which an application runs is no longer the only source of issues affecting the application. In microservice-based applications, an alert regarding one container is commonly related to its interaction with another. It is important to control the flow of metadata to an alert, to ensure it is useful for troubleshooting real issues. Make an effort to determine what information regarding component failures in an application could be most useful, and ensure that information is included within the alert.

What are the Common Features of Container Monitoring Tools?

Many dedicated tools have been introduced that can perform monitoring in a containerized environment.

Real-time monitoring—enable fast processing of metrics to identify anomalies or issues.
Anomaly detection—enable teams to compare activity to benchmarked patterns, to define possible errors or security issues.
Performance baseline—identify a standard performance level and assess infrastructure behavior and live applications for current performance levels.
Network performance monitoring—enables faster troubleshooting by reporting on network service quality as experienced by end-users. Network performance monitoring covers application, infrastructure, and DNS performance. It provides full visibility into all layers, whether cloud-based, on-premises, or hybrid, and offers diagnostics, optimization, and automated reporting.
Configuration monitoring—oversees and alerts about changes to configuration rules, ensures policies are enforced, and records changes for compliance purposes.
API monitoring—by tracing connections between containers, and between containers and external services, API monitoring can identify irregularities in traffic flows, user accessibility, functionality, and security issues.
Dashboards—present data in a visual form.
Topology visualization—present the container ecosystem’s services, integrations, and infrastructure graphically.
Alerting—timely notification to teams with clear, actionable information that can help diagnose and resolve an issue.
Recommendations—provides proactive recommendations to prevent future errors, slowdowns, and failures.
Automation—enables automating container resources based on metrics like resource utilization and performance.

Related content: read our guide to Docker tools

Top Container Monitoring Tools

Prometheus

Prometheus is an open-source solution, one of the only graduated projects of the Cloud Native Computing Foundation (CNCF), which has become a de-facto standard for cloud-native architecture monitoring. It was originally created by SoundCloud to monitor dynamic container environments, and simplifies the process of retrieving metrics from containers.

Promotheus has three main components:

Exporters—stand-alone containers/processors running on a target resource, which generate and export metrics through a dedicated API
Promotheus server—performs service discovery, aggregates and stores metrics in the Promotheus DB for visualization and alerting
Alertmanager—enables setup of monitoring rules, and analyzes data in the Promotheus DB. The alertmanager also sends alerts to multiple receivers when a rule is triggered.

Grafana

The Grafana metrics analysis and visualization suite receives data from ElasticSearch, MySQL, Redis, Prometheus, PostgreSQL, and other data sources. You can leverage a wide variety of official and community-built dashboards. It has its own alerting system, and supports role-based access control (RBAC) for security. It is commonly used in combination with Prometheus to visualize container metrics.

Elasticsearch & Kibana

Developed in Java and based on the Lucene library, ElasticSearch is a full-text, multitenant-capable search engine. Its data structure is based on schema-free JSON documents.

Kibana is a user interface that enables visualizing ElasticSearch data and Elastic Stack navigation. Open source and free, it includes graphs, charts, histograms, sunbursts and more.

Combining the two provides a robust tool for monitoring Docker container logs. However, configuration, initial startup, upgrades and maintenance can be time-consuming, costly, and requires a high level of proficiency with the tools.

Jaeger

Jaeger is a distributed tracing solution that works out-of-the-box with Google’s Istio service mesh implementation. It addresses the problem of debugging transactions in distributed architectures, where a single call can involve a large number of requests between different services.

Open-sourced by Uber Engineering, and currently incubating under CNCF, it enables transaction monitoring and troubleshooting. Through tracing, Jaeger facilitates root-cause analysis, latency and performance optimization, and monitoring of distributed transactions.