What is Kubernetes?
What is Kubernetes?
Kubernetes is an open-source platform that automates the deployment, scaling, and management of containerized applications and services. Originally created by Google, it’s now maintained by the Cloud Native Computing Foundation. As arguably the best open-source container-orchestration system, Kubernetes has become so widespread due to the growing popularity of containers.
Below, we’ll delve into what Kubernetes is, how it works, and some important terminology that helps you gain a full understanding of this technology.
Side note – if you’re a huge fan of comics like us, cash in on this collection of Kubernetes comics. It’s a great way for beginners to learn with visuals.
What are containers?
A container is one unit of software that packages code and allows you to run applications, services, and all their dependencies separately, in complete isolation of other processes. As a modernized alternative to virtual machines and physical servers, you can deploy these pieces of code quickly and reliably because the code is packaged separately in a standardized way (think neatly packed, individual containers within a large shipping crate) so they can be distributed on different physical and virtual machines without the need for an entire operating system.
In doing so, they use far fewer resources and enable more support across devices and platforms than virtual machines, drastically improving upon efficiency, hybrid and cross-cloud utilization, portability, and speed.
While containers have existed for over a decade, built into many flavors of the Linux operating systems (i.e. the chroot system call), Docker popularized the use of software containers in 2014 by making them easy to manage with significantly faster deployment speeds, smaller footprint, application portability, and re-use.
The beauty of containers is the ability to virtualize an OS system, where you can pack your code and dependencies into a container that can run anywhere, instead of installing several virtual machines or shipping full software or operating systems. You can even connect them together to run a full application in the cloud.
Over 37B containerized apps have been downloaded since then, a good indicator that containers are quickly becoming a crucial piece of the DevOps process enabling better CI/CD processes.
What does Containerization mean?
Applications that are deployed and distributed through OS-level virtualization without having to create entire virtual machines for each app.
Historically, production applications were run on Virtual Machines. Virtual Machines require maintenance and upkeep of not only the application but the entire operating system which means the overhead of updates, dependencies (and vulnerabilities) when scaling and replicating production applications.
What a production app looks like:
What’s the difference between containers and virtual machines?
Containers and Virtual Machines (VM) are both virtualization methods to deploy applications to production that run on a host machine. Virtual Machine images are more than just the application, it includes the entire operating system. Containers are lightweight compared to VM images because they sit on top of the physical hardware, it’s operating system and share the kernel, binaries, and libraries with other containers on the host. Their image sizes are in megabytes instead of gigabits and can start up in seconds rather than minutes. This also means that containers use fewer resources to run and for the same bare metal hardware, many more containers can run than virtual machine images.
It helps to have a quick reference of all the key terms when learning about Kubernetes.
- Node: a collection of IT resources (whether cloud or physical or virtual machines ) that provide the compute and storage resources to run containers.
- Containers: docker images of applications.
- Pod: Instead of managing individual containers, Kubernetes basic building block to schedule is a pod of containers. This grouping is of one or more containers which will be on a single IP and shared namespace. For example, a web server and logging agent can be grouped into one pod which can be scaled.
- Label: How pods are identified, there can be as many labels as needed for pods which can be used as metadata with semantic meaning. This is the primary grouping mechanism in Kubernetes and these labels are queryable by selectors in the manifest file.
- Service: A consistent network for pods so that there is a way to discover the service and access it (virtual IP or port) without NAT, regardless of the current underlying state of the pods. Incoming requests are load balanced to the right pods.
- Namespaces: Kubernetes supports multiple virtual clusters backed by the same physical cluster. It is not necessary to use. Namespaces provide a scope for names.
- Master Kubernetes Components: This is the “brain” where are all the work that is done to run the Kubernetes cluster. The components are:
- API Server
- Controller Manager
- Scheduler: responsible choosing the best fit of which nodes to run the pods on. If the default scheduler doesn’t meet your needs, you can implement your own or run multiple schedulers.
- Etcd: what Kubernetes uses to store meta data about the cluster in key/value database that is distrubted and consistent. It is used for configuration management, service discovery and coordinating distributed work.
- API Server: It is the main management point of the Kubernetes cluster. It processes REST operations, serves the Kubernetes API and is the only component to update etcd. Using the kubectl command-line interfaces with this master API server.
- Controller Manager: This is a daemon that embeds the controllers with Kubernetes. The ones that ship with Kubernetes include the Replication Controller, Endpoint Controller, Namespace Controller. It also does lifecycle functions like garbage collection.
- Minion Kubernetes Components: these are located where the services are being run. These include:
- Kubelet: Also known as a pod agent, it is the process responsible for running a pod
- cAdvisor: This is the container advisor, provides resource usage, performance statistics
- Service Proxy: the load balancer for Pods
The community and ecosystem built around Kubernetes is one of the key factors of the widespread adoption of Kubernetes as the container orchestration solution.
The features are actively driven by Google, managed by the Linux Foundation with support from companies like IBM, Huawei, Intel, Red Hat, and the open source community are all the markers of a solution with commercial viability and growth. It was designed to be more operations-centric and has become a friend of the DevOps community which has helped spur its adoption.
As for limitations to its adoption, two big gaps exist:
- Integration with Microsoft Windows Containers, Hyper-V Containers.
- No true multi-tenancy support yet
Kubernetes vs Docker Swarm
A swarm is a group of nodes that run Docker and one of the nodes acts as a manager for the other nodes, provides the service discovery and scheduler components. It is limited to the Docker’s API capabilities, services have to be scaled manually and there is a much smaller community contributing to the project.
Kubernetes vs Apache Mesos
This open source project to manage clusters predates Docker Swarm and Kubernetes. Mesos is combined with Marathon to orchestrate container-based applications and can support both containerized and non-containerized workloads. It is a compelling alternative for mixed environments but will take longer to start.
How to manage Kubernetes logs
Using Kubernetes means that services are ephemeral and dynamic, with changes happening all the time. Kubernetes makes decisions for you in when pods are started and stopped which is helpful for availability and uptime but it also means that you cannot log into a server to look at past log files for root cause analysis.
Check out this Kubernetes logging tutorial to learn more about how to monitor various parts of Kubernetes. Using kubectl, it is possible to get log data for the pod if it is running but it disappears when the pod dies.
Node logs are a little more persistent and stored in a JSON file which can get really big really fast. These mechanisms point to a real need to have a practical log management solution for Kubernetes, at the cluster level.
The most common way to collect Kubernetes logs is to use a fluentd agent to collect logs from nodes and pass them on to an external Elasticsearch cluster. To optimize log collection, add a sidecar container to contain each type of log data to solve for the Kubernetes scale and complexity. Each sidecar can contain a fluentd agent for collecting and transporting logs.
Note that while ELK might seem free to start, read this whitepaper to learn about the True Cost of the ELK Stack.
Archived Log Storage
To keep log audit trails for compliance or breaches in the past, this archived data will need to be stored somewhere. You can either use cloud block storage like AWS S3 or Azure Blog.
LogDNA is deeply customized for Kubernetes. LogDNA automatically recognizes all the metadata in the Kubernetes cluster including pods, nodes, containers, and namespaces. You can start collecting Kubernetes logs using just 2 simple kubectl commands:
kubectl create secret generic logdna-agent-key –from-literal=logdna-agent-key=YOUR-INGESTION-KEY-HERE kubectl create -f https://raw.githubusercontent.com/logdna/logdna-agent/master/logdna-agent-ds.yaml
Then you can leverage the intuitive interface, blazing fast search, LiveTail and many other features that LogDNA’s log management solution offers so that you can focus on building great products.