I spoke at Container World 2019 in Santa Clara and shared insights on how LogDNA scales Elastic Search using Kubernetes over the years.
Here are some highlights from the talk and you can also find the slide deck below.
Elastic Search is the “E” in the popular ELK stack and allows easy searching of unstructured data. It is a distributed full-text search engine that is queryable using a JSON API and great for logging. It scales relatively easily because it handles clustering and syncing tasks across nodes and nodes can be added relatively easily. It’s a popular choice when in the market for an out of the box solution that’s easy to get started
Kubernetes is an open source container orchestration platform developed by Google. It schedules all your workloads onto all available resources. The cloud providers also have integrations that scales resources like memory and you don’t have to do it by hand. Kubernetes allows for configuration as code and static docker images enforce consistent pod behaviors across your infrastructure. You’ve been watching the Kubernetes hype train ship and want to jump on board.
At LogDNA, we have made many modifications to the Elasticsearch interface and we’ve built in-house versions of the L (Logstash) and K (Kibana) of the ELK stack for better performance.
We needed a consistent way to deploy our software across varying infrastructures. We run our application on both cloud and on-premise and we are agnostic to wherever our customers want to run LogDNA, whether it’s Amazon, Azure, a data center in Las Vegas, a barn in Russia, anywhere.
We use Kubernetes to help us better automation for versioning, CI/CD and maintenance. We run ES on k8s at scales.
These are a few of the steps involved in running ES on Kubernetes:
First, at LogDNA we started with a few sane defaults that we recommend:
We will dive deeper into:
Once you have your pods, you’ll need to worry about since ES is a distributed database, the pods need talk to each other. ES hot and cold have a single load balanced cluster IP service endpoint for insertions and query data.
ES masters are really important because they hold an election to discover each other you have to make sure you have:
What this does is to allow you to list all the available IP addresses for the pods that are in the group. Instead of getting a load balanced endpoint, all the masters can discover each other.
2 important settings for clusterIP:None
Here’s what we use:
What’s irritating is that index templates can’t be set ahead of time. You have to go and ping the API once your ES is up and then add your index templates. We have a job that does that.
Cerebro connects to your ES service endpoint(s). It contains an ES node/pod list and their health stats. You can easily view indices and shards across the available data nodes. You can modify index settings, templates, and data. Most importantly you can move shards around.
Not everything is available via Cerebro.
We use Insomnia (a REST API GUI to share API calls) though curl works too
I know we’ve walked through a lot of what seems like obscure settings in Elasticsearch. When you’re running Elasticsearch in a Docker container you have to realize that it was not designed for Docker containers. It requires some coaxing to properly run inside a container.
Thu Nguyen is a technical writer who cares deeply about human relationships.
November 17, 2020
December 10, 2020 Update: Agent v2.2 has been GA'ed and includes all of the Agent v2.2 Beta features except for running as non-root. Our newly...
September 10, 2020
Keeping track of what's going on in Kubernetes isn't easy. It's an environment where things move quickly, individual containers come and go, and a large...