Security is perhaps the most complex, time-sensitive, and critical aspect of IT Ops. It’s similar to the ICU (Intensive care unit) room in a hospital where the most serious of cases go, and any mistake can have far-reaching consequences.
Just as doctors need to monitor various health metrics like heart rate, blood pressure, and more, Security Analysts need to monitor the health of their systems at all time – both when the system is functioning as expected, and especially when things break. To gain visibility into system health at any time, the go-to resource for any Security Analyst is their system logs. In this post, we cover all that can go wrong in the lifetime of an application, and how Security Analysts can respond to these emergencies.
Common security incidents
An application can be attacked in any number of ways, but below are some of the most common type of security issues that SecOps teams deal with day in and day out.
- Viruses & malware: This is when a malicious script or software is installed on your system and it tries to disrupt, or manipulate the functioning of your system.
- Data breaches: The data contained in your systems is an extremely valuable asset. A data breach is when this valuable data is accessed, or stolen by an attacker.
- Phishing attacks: This tactic is used on end users where an attacker poses as a genuine person or organization, and lures the user to click on a link after which their credentials like logins, and financial information are stolen.
- Account takeovers: This is when a cyber criminal gains access to a user’s login credentials and misuses it to their own advantage.
- DDoS attacks: Distributed denial of service (DDoS) attacks occur when multiple infected computers target a single server and overload it with requests. The server is overloaded with requests and denies service to other genuine users.
- SQL injection: This happens when an attacker runs malicious SQL queries against a database with the aim of taking the database down.
- Cross-site scripting: This type of attack involves the hacker using browser scripts to execute harmful actions on the client-side.
When an incident occurs, the first thing you need to do is estimate the impact of the attack.
Estimate the impact of the attack. Which parts of the system have failed, which parts are experiencing latency, how many users and accounts are affected, which databases have been compromised, and more. Understanding this will help you decide what needs to be protected right away, and what your first actions should be. You can reduce the impact of an attack by acting fast.
Incident management using log data
Once you’ve taken the necessary ‘first aid’ measures, you’ll need to do deeper troubleshooting to find the origin of the attack, and fully plug all vulnerabilities. When dealing with these security risks, log data is indispensable. Let’s look at the various ways you can use log data to troubleshoot, take action, and protect yourself from these types of attacks.
When an incident occurs, you probably receive an alert from one of your monitoring systems about the issue. To find the origin of the incident, where you look first will depend on the type of issue.
At the application level you want to look at how end users have used your app. You’d look at events like user logins, and password changes. If user credentials have been compromised by a phishing attack, it’s hard to spot the attacker initially, but looking for any unusual behavior patterns can help.
Too many transactions in a short period of time by a single user, or transactions of very high value can be suspicious. Changes in shipping addresses is also a clue that an attacker may be at work.
At the application layer, you can view who has had access to your codebase. For this you could look at requests for files and data. You could filter your log data by application version, and notice if there are any older versions still in use. Older versions could be a concern as they may not include any critical security patches for known vulnerabilities. If your app is integrated with third-party applications via API, you’ll want to review how it’s been accessed.
When looking at log data it always helps to correlate session IDs with timestamps, and even compare timestamps across different pieces of log data. This gives you the full picture of what happened, how it progressed, and what the situation is now. Especially look for instances of users changing admin settings or opting for their behavior to not be tracked. These are suspicious and are clear signs of an attack. But even if you find initial signs at the application layer, you’ll need to dig deeper to the networking layer to know more about the attack.
At the network level there are many logs to view. To start with, the IP addresses and locations from where your applications were accessed from is important. Also, notice the device types like USBs, IoT devices, or embedded systems. If you notice suspicious patterns like completely new IP addresses and locations, or malfunctioning devices, you can disable them or restrict their access.
Particularly, notice if there are any breaches to the firewall, or cases of modifying or deletion of cookies. In legacy architecture a breach in a firewall can leave the entire system vulnerable to attackers, but with modern containerized setups, you can implement granular firewalls using tools like Project Calico.
Look for cases of downtimes or performance issues like timeouts across the network, these can help you assess the impact of the attack, and maybe even find the source. The networking layer is where all traffic in your system flows, and in the event of an attack, malicious events are passing through your network. Looking at these various logs will give you visibility into what is transpiring across your network.
Going a level deeper, you can look at the health of your databases for more information.
Hackers are after the data in your system. This could include financial data like credit card information, or even personally identifiable information like a social security number or home address. They could look for loosely stored passwords and usernames associated with them. All this data is stored in databases in cloud or physical disks in your system, and they need to be checked for breaches, and secured immediately if not compromised already.
To spot these breaches take a look at the log trail for data that’s been encrypted or decrypted. Especially notice how users have accessed personally identifiable information or payment related information. Changes to file names and file paths, an unusually high number of events from a single user are all signs of data breaches.
Your databases are also used to store all your log data, and you’d want to check if disk space was insufficient to store all logs, or if any logs were left out recently. Notice if the data stored on these disks were altered or tampered with.
The first thing to check in the operating system (OS) layer should be the access controls for audit logs. This is because savvy attackers will first try to delete or alter audit logs to cover their tracks. Audit logs record all user activity in the system, and if an attack is in progress, the faster you secure your audit logs the better.
Another thing to check is if timestamps are consistent across all your system logs. Often, attackers would change the timestamps for certain log files making it difficult for you to correlate one piece of log data with another. This makes investigation difficult and gives the attacker more time to carry out their agenda.
Once you’re confident that your audit logs and timestamps haven’t been tampered with, you can investigate other OS level logs to trace user activity. How users logged in and out of your system, system changes especially related to access controls, usage of admin privileges, changes in network ports, files and software that was added or removed, and commands executed recently, especially those from privileged accounts – all this information can be analyzed using your system logs.
The operating system is like the control center for your entire applications stack, and commands originating from here need to be investigated during an attack. The next place to look for vulnerabilities is the hardware devices in your system.
Hardware or infrastructure level
Some of the recent DDoS attacks, like the one that affected Dyn last year was a result of hackers gaining access to hardware devices that end users owned. They cracked the weak passwords of these devices and overloaded the server with requests. If end user devices are a major part of your system, they should be watched carefully as they can be easy targets for hackers. Looking at log data for how edge devices behave on the network can help spot these attacks. With the growth of the internet of things (IoT) and smart devices across all sectors, these types of attacks are becoming more common.
Additionally, if your servers or other hardware are also accessed by third-party vendors, or partner organizations, you need to keep an eye on how these organizations use your hardware and the data on it.
After the incident
Hopefully, all these sources of log data can help you as you troubleshoot and resolve attacks. After the incident, you’ll need to write a post-mortem. Even in this step, log data is critical to telling the entire story of the attack – its origin, progression, impact radius, resolution, and finally restoring the system back to normalcy.
As you can tell, log data is present at every stage of incident management and triaging. If you want to secure your system, pay attention to the logs you’re collecting. If you want to be equipped to deal with today’s large scale attacks, you’ll need to look to your log data. It’s no doubt logs are essential to SecOps, and understanding how to use them during an incident is an important skill to learn.