By: Thu Nguyen
Read Time: 9 min
SaltStack is an open source configuration management tool that lets you manage your infrastructure as code. Using SaltStack, you can manage tens of thousands of servers remotely using either a declarative language or imperative commands. It’s similar to configuration management tools such as Puppet, Chef, and Ansible.
Like most configuration management tools, SaltStack logs the activities taking place between masters and minions. These logs include important details about these activities including what actions were taken, whether they were successful, how long they took, and which minions were involved. Using this data can provide you with valuable insights into your SaltStack environment and make managing your minions easier.
SaltStack consists of central command servers (called “masters”) that send commands to one or more clients (called “minions”). These commands are based on configuration declarations called states. You can apply a state to a minion, and SaltStack will determine which commands to execute on the minion so that it meets this state.
SaltStack is designed for high performance, with the documentation stating that a single master can manage thousands of systems. In fact, LogDNA uses SaltStack to provision and manage infrastructure even across several cloud platforms.
You can send logs directly from SaltStack to LogDNA through the LogDNA SaltStack integration. With this integration, any state changes are logged and forwarded to your LogDNA account. For example, let’s say we want to install Vim on each of our minions. We can do so by creating the following Salt state file:
Next, we’ll run the following command on the Salt master:
root@saltmaster:~# salt '*' state.apply vim
Then, we can view the results in the LogDNA web app:
LogDNA automatically extracts the following fields:
However, Salt events contain additional data such as:
Although LogDNA doesn’t parse these fields automatically, we can use custom parsing rules to extract them. For successful events, we can extract the name of the state and its runtime:
Failed events also provide the state and time, but they also include a reason for why the event failed. We extracted this separately:
Even the most rigorously tested state files can fail to execute properly. When applying a state automatically or asynchronously, you may not be aware of an execution failure until you inspect the minion that the state ran on. But to understand the cause of the problem and to troubleshoot the state file, you will need information from Salt itself.
Fortunately, each event that gets sent to LogDNA includes a status of either SUCCESS or FAILURE. You can filter your view to only show failed events using the following search:
From this view, we can create an alert that sends a notification after detecting any new failed events. We can forward the alert to the DevOps team to notify them of failed events, as well as the reasons behind the failure. Each event also includes the minion name in the host field, letting engineers identify the specific minion that the failure occurred on.
For each event, Salt reports the total time the event took to run. If you notice that your states are taking an unusually long time to complete, this is a useful way of determining whether the problem is caused by the master, the minion, or the state itself.
Using a custom parsing rule, we’ll extract each event’s duration to a numeric time field. This way, we can use comparison operators and charts to analyze our events more thoroughly. For example, we can use the following search to find events that took longer than ten minutes to complete:
We can also graph this data to find any anomalies or trends. For example, the following graph shows the duration of each event applied to a single minion. The most apparent issue is the first event, which took 3–4 times as long as the four following events:
By clicking on the first peak and selecting “Show Logs”, we can find the exact state change that resulted in the slow performance. As it turns out, the change was a pretty significant one that involved installing the Apache web server, creating files, adding several users and groups, and verifying each change. The next four changes were much less involved, resulting in them taking far less time (5–6 seconds) to complete.
Although state.apply includes a dry-run option, there’s always the risk of a state accidentally being applied to live servers. When this happens, you’ll want to know which minions were affected and what the results of the state change were.
For example, let’s say we accidentally deployed PHP to each of our minions using a custom state file named php.sls. We added test=True to the end of our command but accidentally typed a semicolon, causing our terminal to run two separate commands:
root@saltmaster:~# salt '*' state.apply php ;test=True
We could debug this ourselves by scrolling up through the output. However, we only have two minions in our environment. Imagine if you had 100, 500, or 1,000+ minions. Trying to scroll back through each result would be incredibly labor-intensive.
Instead, we can use LogDNA to see how many minions each state was applied to. By filtering our logs, we can even see the exact minions that were impacted. In the following image, we created a graph of the total log volume from Salt and added a histogram of the state field. This shows us which states were recently applied and the number of minions they applied to. Our environment has just four minions, which is why the maximum event count is also four:
If we want to show a list of minion names instead, we can filter the graph to only count logs from the apache2 state, then add a histogram based on the host field.
With LogDNA, monitoring SaltStack is easy. To get started, follow the instructions for installing the LogDNA Salt deployment integration and apply a state in your Salt cluster. In the LogDNA web app, select “Salt” from the All Apps dropdown, or enter app
First published on www.ibm.com on October 7, 2019. Written by: Norman Hsieh, VP of Business Development, LogDNA You know what they say: you can’t fix what you can’t...
First published as a case study on www.ibm.com on October 3, 2019. What is Log Analysis? IBM Cloud™ Log Analysis with LogDNA enables you to quickly find...
Single sign-on (SSO) is an authentication model designed to let users access different applications, services, and resources using a single set of credentials. Instead of...