I spoke at Container World 2019 in Santa Clara and shared insights on what LogDNA has learned in scaling Elastic Search using Kubernetes over the years.
Here are some highlights from the talk and you can also find the slide deck below.
First, the basics…
What is Elasticsearch (ES) and why would I use it?
Elastic Search is the “E” in the popular ELK stack and allows easy searching of unstructured data. It is a distributed full-text search engine that is queryable using a JSON API and great for logging. It can be scaled relatively easily because it handles clustering and syncing tasks across nodes and nodes can be added relatively easily. It’s a popular choice when in the market for an out of the box solution that’s easy to get started
Why would you want to run Elasticsearch (ES) on Kubernetes (k8s)?
Kubernetes is an open source container orchestration platform developed by Google. It schedules all your workloads onto all available resources. The cloud providers also have integrations to autoscale resources like memory and you don’t have to do it by hand. Kubernetes allows for configuration as code and static docker images enforce consistent pod behaviors across your infrastructure. You’ve been watching the Kubernetes hype train ship and want to jump on board.
Why does LogDNA use ES and k8s?
At LogDNA, we have made many modifications to the Elasticsearch interface and we’ve built in-house versions of the L (Logstash) and K (Kibana) of the ELK stack for better performance.
We needed a consistent way to deploy our software across varying infrastructures. We run our application on both cloud and on-premise and we are agnostic to wherever our customers want to run LogDNA, whether it’s Amazon, Azure, a data center in Las Vegas, a barn in Russia, anywhere.
We use Kubernetes to help us better automation for versioning, CI/CD and maintenance. We run ES on k8s at scale.
Running Elasticsearch on Kubernetes is not straight forward
These are a few of the steps involved in running ES on Kubernetes:
- Choose the appropriate Elasticsearch version and select the correct settings (there are hundreds of settings)
- Learn the expansive query language for Elasticsearch and integrate it into your workflows
- Set up a Kubernetes environment with access to appropriately sized hardware
- Configure the Elasticsearch k8s workload to request the appropriate resources, including disks
- Ensure the correct index templates and cluster settings are applied after launching your ES cluster
- Create k8s services such that Elasticsearch pods can find each other
- Troubleshoot all remaining issues as they arise and continue to manage and scale the cluster
How LogDNA got started running ES on k8s
First, at LogDNA we started with a few sane defaults that we recommend:
- ES version 5.5 & Kubernetes cluster v1.11+. We use the alpine flavor of Elasticsearch to keep our docker image small: elasticsearch:5.5.2-alpine.
- Hardware resources that worked for us are (k8s nodes) with at least 64 GB of RAM and 16 vCPUs (depends on your volume)
We will dive deeper into:
- Statefulsets and Services yaml configurations (we need them for identity, disks, and networking)
- Basic, but important cluster settings & a good starter index template. Index templates define how data is saved to an index, knowing how to configure this has helped us increase performance by an order of magnitude.
- Deploy an ES cluster management GUI (cerebro) to help with troubleshooting
1. Configuration Tips
- Use two ConfigMaps
- The elasticsearch configuration file
- A start script used to configure ulimits, permissions, and JVM heap size
- We use three ES role types (statefulsets)
- Master – handles lightweight cluster-wide actions (does not require disk)
- Hot – handles incoming writes to active indices (higher cpu to disk ratio)
- Cold – stores and queries older indices (lower cpu to disk ratio)
- volumeClaimTemplates is a field in the Statefulsets which allows for disk and identity. This allows you to dynamically provision disks.
- There are important security context settings that you need in each statefulset. This is where the obscure knowledge starts showing. You need to add security context for priviledged containers to run Elasticsearch. You need IPC Lock to make sure you have access to more file handles than usual and sys resource.
- Use k8s pre-emption to secure your ES pods get scheduling priority. When you’re managing a large set of nodes in kubernetes sometimes small microservices take up 10% of the nodes and you need that extra 1% to launch that Elasticsearch pod because it takes up more resources. What pre-emption does is to say “hey, my disk space pod is more important than your microservice, I’m going to bump you off and be scheduled here and it’s great for scale.”
- Create a startup script to set the correct configuration prior to starting the JVM (note the vm.max_map_count, we couldn’t run it without it).
2. Basic Cluster Settings
Service Discovery in Kubernetes
Once you have your pods, you’ll need to worry about since ES is a distributed database, the pods need talk to each other. ES hot and cold have a single load balanced cluster IP service endpoint for insertions and query data.
ES masters are really important because they hold an election to discover each other you have to make sure you have:
- 1 load balanced cluster IP for handling transport (9300) and HTTP API requests (9200)
- 1 clusterIP:None service used for ES unicast discovery.
What this does is to allow you to list all the available IP addresses for the pods that are in the group. Instead of getting a load balanced endpoint, all the masters can discover each other.
2 important settings for clusterIP:None
- Ensure the DNS is publishable immediately
- No sessionAffinity ensures that your addresses are up to date. If the master pod has a probe that determines if it’s ready to launch, it doesn’t fail publishing the address so that things discover each other without going through crash loop back off.
ES Startup Settings
Here’s what we use:
- Ensure memory_lock is on – important when consuming large amounts of memory
- Adjust the minimum master nodes based on the total number of masters you have, make sure that is sized appropriately to be a functioning cluster.
- The clusterIP:None service is referenced by the unicast settings
- Set the correct ES role
- Specify the number of cores that the JVM will use
Configuring an index template
What’s irritating is that index templates can’t be set ahead of time. You have to go and ping the API once your ES is up and then add your index templates. We have a job that does that.
- Optimizing shards is helpful for fine tuning performance. The more shards you have the more distributed your system but the more state that has to be syncronized.
- Configure index.total_shards_per_node based on your expected load
- Optimizing shards can increase performance and reduced cluster state overhead
- Set a refresh_interval that works for you – how often data is surfaced after insertion to be searchable.
- Higher refresh intervals offer better throughput performance at the cost of latency
- We typically use 15-30 seconds
- Change translog.durability to async (allow asynchronous translog writes)
- We regret not discovering this setting sooner, as it gave us 5-10x increase in performance
- Note: index templates MUST be applied AFTER the ES masters are already running
3. Managing Elasticsearch
A) Cerebro (Manage using a GUI)
- If you’re using ES v2.x or lower, use kopf
Cerebro connects to your ES service endpoint(s). It contains an ES node/pod list and their health stats. You can easily view indices and shards across the available data nodes. You can modify index settings, templates, and data. Most importantly you can move shards around.
Not everything is available via Cerebro.
B) Managing ES through API calls
We use Insomnia (a REST API GUI to share API calls) though curl works too
- Here are API calls we commonly use:
- Outputs useful information about the cluster. Cerebro displays this status color as a bar on the top.
- We look at this a lot. When there is a large number of pending tasks, the operations that require cluster sync take a while. One thing you can do if you run into this problem is to lock shard allocations (easily done through the cerebral gui) which clears out the pending task list
- /_flush?force & /_cluster/reroute?retry_failed=true
- When you see unassigned shards in your cluster, it means that the shards can’t figure out where to live and they retry and stop. One of the things you can do is flush the cache and force the cache to retry.
I know we’ve walked through a lot of what seems like obscure settings in Elasticsearch. When you’re running Elasticsearch in a Docker container you have to realize that it was not designed for Docker containers. It requires some coaxing to properly run inside a container.
- Remember to use the correct security context, ulimit, and vm settings.
- There are native concepts in Kubernetes than can make running ES easier
- Index templates have a big impact on how well your ES cluster runs
- We recommend managing Elasticsearch with both GUIs (cerebro) and ES APIs. Both are extremely useful for tuning performance