A couple of months ago, I came across some blog posts about detecting threats by analyzing process trees or process parent-child relationships. The idea behind this method is to create process trees and find rare patterns that might be an indication of malicious activity. To make it more clear, let’s have a look at an example:
You open a document you received and click “enable content”. Then, all of a sudden, you get compromised. When you opened the document and clicked “enable content”, here is what happened (in orange):
By creating the process trees and analyzing them, we can detect the anomalous process tree (in orange) and investigate if it’s malicious or not. This method can help us to uncover lots of attacks that are not detected. However, the problem with the method is that it’s not easy to create process trees and analyze them. One method I’ve found is to use PowerShell and analyze the Sysmon logs on the endpoint, but it’s not scalable. Machine learning is an excellent method, but it requires knowledge, experience, and significant investment. As of this writing, probably just a couple of organizations are able to leverage ML.
So, how can we make this magic happen without making a significant additional investment?
In Microsoft Defender for Endpoint (MDE/MDATP), I’ve solved this puzzle with KQL after spending quite some time and developed a query that performs the process tree analysis. I’ve also developed similar queries for Sysmon logs for Azure Sentinel and Splunk. The solution should also work in other products that have SQL join capabilities, but I don’t have a chance to test.
Let’s see how the solution works
In short, we need to do the following:
- Develop a query to generate process trees
- Use the long tail analysis method to find and display rare occurrences.
- Find to which device the rare occurrence belongs and get the details of the process tree, especially the command line.
Generating the process tree
To create the process tree, there should be a unique and common value as the key between the two process creation events so that we can join them. If the key is not unique, the join operation generates wrong results and consumes too many resources, failing to run. In MDE DeviceProcessEvents, there is no field to be used as the key, but we can create one! We have the following fields in DeviceProcessEvents:
By combining these four fields, we have a key field that we can use to join events properly. Alternatively, we can use these 4 fields all together in the join condition.
As we have a key field, we can join events iteratively to generate process trees. There are two options for generating a process tree:
- Start from a process and find its child, grandchild, etc.
- Start from a process and find its parent, grandparent, great-grandparent, etc.
The fun part begins here!
If you start generating process trees just by using either option one or two, you will have problems. For example, if there is a process tree with 4 processes, you will have 3 different process trees 2 of which are a subset of the main tree. Therefore, both options cause duplication and performance issues, and the query doesn’t run as expected or doesn’t run at all.
You have to know where the process tree starts or ends to avoid duplication and performance issues. Knowing where the process tree ends is impossible because we don’t know how many processes a process tree has. We can make a guess, but even in this case, we will have process trees that are a subset of the main tree. However, if we know where the tree starts, we can go down to the x-th level even without knowing where the tree ends and have only one unique tree without having subset trees.
How do we know where the tree starts?
Take a deep breath and answer this question: what is the main attack vector according to the statistics? Email! And, email attacks involve a malicious document most of the time! This means that most of the attacks, probably around 85% or higher, target Office and PDF applications. If we can create process trees of Office and PDF applications, we can detect most of the attacks during their initial stages!
So, we don’t have to generate all the process trees. Generating only the required process trees lets us know where the tree starts and improves the query performance. I’ve made a quick analysis and found out that only ~1% of the process events belonged to Office applications. This is a huge difference and most likely solves the performance issue.
As we now have everything we need, let’s generate process trees. To make it easier to understand here is what you see in DeviceProcessEvents for the scenario above:
Winword.exe — created→cmd.exe
Cmd.exe — created→powershell.exe
Powershell.exe — created→ msbuild.exe
After filtering the processes created by winword.exe, we can join process events by using the fields DeviceId, InitiatingProcessCreationTime, InitiatingProcessId, InitiatingProcessFileName as the key. The join kind should be leftouter because we don’t know if there is a subsequent process or not.
The result of the query would be as follows:
<DeviceId> | winword.exe | cmd.exe | powershell.exe | msbuild.exe
Long Tail Analysis
As we have all process trees in a table format, we can easily get the count of each process tree after removing the DeviceId field and filter the rare process trees:
| project-away DeviceId
| summarize count() by <Field1>, <Field2>, <Field3>, <Field3>
| where count_ < 10 // you can increase or decrease this threshold.
Finding the Devices and Getting Details
Now that we have the rare process trees as a table, we can join them back to the original table of process trees to get the details. Then we can analyze the results to see if there is a suspicious/malicious activity and perform an in-depth analysis.
The query generates around 500–1000 results for “winword.exe”,”excel.exe”,”powerpnt.exe”,”acrord32.exe” in a very large environment for the last 2 days. It takes around 1–2 hours to analyze the result. If you like, you can use Excel to analyze it easily. After spending some time, you can find and remove the constant false positives and significantly reduce the number of results.
We can use the same query and analyze process trees of IIS/Apache processes to detect web shells or intrusions of public-facing applications.
In my opinion, process tree analysis has a huge potential for detecting threats in their initial stages. Moreover, it can also be used for detecting other tactics like lateral movement and analyzing a single machine.
I’ve developed three queries: one for Microsoft Defender for Endpoint(MDATP/MDE), one for Azure Sentinel (for Sysmon logs), and one for Splunk (for Sysmon logs); I’m sharing them below. You can also find the queries in my Github repo. I’ve also wanted to develop one query with EQL, but I’m not familiar with it (yet).