Threat Hunting with Data Science: Registry Run Keys

In data science, the majority of the time is spent on cleaning and normalizing the data. Just like in red teaming/pentest activities where preparation/reconnaissance is the most important step, preparation of data is the most important step in data science activities. In this post, I will explain how we can apply some basics of data science to threat hunting and detect suspicious Registry Run keys on a large scale with KQL in Azure Sentinel/MDATP/MDE/M365D. It is also possible to apply the same method in Splunk and other tools that have the capability of manipulating values based on regex matches(Tip: rex command in Splunk does the same job).

If there is a malicious item in the Registry persistence locations, it should be somewhat unique in the environment (The malicious item can use masquerading to look like a well-known item and invalidate this hypothesis, I’ll provide a solution for that at the end of the post).

When you try to apply statistical analysis on the RegistryValueData, below is what you will probably get depending on your environment size:

Result from 30d of data

If you analyze the results without counting the rows, you will see items like below:

"C:\Users\testuser1\AppData\Local\Temp\test.exe""C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise" --runOnce --installSessionId 25a56748-489d-4257-814e-fa884df0599"%ProgramData%\Microsoft\Windows Defender\platform\4.18.2101.9-100\DefenderCSP.dll""C:\ProgramData\Package Cache\{29f85b7a-f685-45c3-a213-e306549e95a4}\Setup.exe"

The values in bold make the items unique and make the result set difficult to analyze.

If we can normalize these values, the frequency analysis probably gives better results. To do that, we can use replace function (rex in Splunk) to create a new field that has the normalized value. Before doing that, we need to analyze the result set and find what kind of values cause the uniqueness. Then we need to prepare regular expressions to be used for the normalization. Using the same data set and normalizing the values gives the below result:

Result from 30d of data after normalization

80% reduction in the result set!

Finally, we can use an anomaly condition and get all the details of unique items back from the dataset and display only the ones from the last 1d:

Result without manual whitelisting(except one)

From 50.000 to 1.000!

If we analyze the result set further, we would probably see some specific folders or conditions that still cause the uniqueness. We can further filter or normalize them to reduce the result set. We can also analyze the processes and find the ones that legitimately modify the Registry, and filter them out.

Software installations or deployments in the environment often modifies the Registry Run items. We can generate a list of well-known/trusted processes and exclude them. Microsoft Defender has a built-in function that we can use for this purpose. invoke FileProfile() retrieves prevalence information of a SHA1 value. By using it and filtering the possible well-known processes (except some important ones because they can be used for this malicious purpose):

Just 4 results to investigate!

In the case of masquerading the registry value, we can apply the same method on the RegistryValueData-InitiatingProcess pair. As an example, if the Registry item Firefox.lnk is not added by Firefox installation/update itself, it is highly suspicious.

As you see, applying just a bit of data science can do wonders in threat hunting. Just by normalizing the data and performing frequency analysis (data stacking), it is possible to detect malicious activity that involves a technique difficult to detect.

You can find the query in my Github repo. If you want to learn more about data stacking, FireEye has a decent post here.

Cyber Defense Professional. @Cyb3rMonk ( Threat Hunting | Active Defense | Cyber Deception | SOC | SIEM )

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store