AI Detectives and the Case of the Disguised Droppers

Written by deobfuscate | Published 2025/04/22
Tech Story Tags: malware-analysis | code-deobfuscation | large-language-models | cybersecurity | deobfuscation-techniques | cyber-threats | threat-intelligence | obfuscated-malware

TLDRUsing 2,000 real Emotet dropper scripts, the experiment tests LLMs’ ability to deobfuscate malware and extract threat intel at scale.via the TL;DR App

Authors:

(1) Constantinos Patsakis, Department of Informatics, University of Piraeus, 80 Karaoli & Dimitriou str., 18534 Piraeus, Greece and Information Management Systems Institute of Athena Research Centre, Greece;

(2) Fran Casino, Information Management Systems Institute of Athena Research Centre, Greece and Department of Computer Engineering and Mathematics, Universitat Rovira i Virgili;

(3) Nikolaos Lykousas, Data Centric, Romania.

Table of Links

Abstract and 1 Introduction

2 Related work

2.1 Malware analysis and countermeasures

2.2 LLMs in cybersecurity

3 Problem setting

4 Setting up the experiment and the dataset

5 Experimental results and discussion

6 Integration with existing pipelines

7 Conclusions, Acknowledgements, and References

4 Setting up the experiment and the dataset

In general, many practitioners have used LLMs to create code summaries, declaring that they can understand the code quite accurately. Nevertheless, understanding code, or even messy code, is one task, but understanding code that has been deliberately made to bypass checks from AVs and prevent the reader from understanding what it does is an entirely different task. Therefore, to assess the capabilities of LLMs in deobfuscating malicious code, our experiments must contain enough samples and a base truth. The objectivity and scalability of the experiments are crucially important, and each introduces different constraints. While the above could be partially addressed by using our own samples and obfuscating them, this would significantly impact the realism of the experiments. Moreover, LLMs have specific limits on the information they can process, and unpacking a whole packed executable is beyond their capacity, not only because of the input limits but also because of the complexity of the evaluations that have to be performed. For instance, modern malware often has encrypted their payload. Yet, decrypting a string is far beyond the capacity and scope of an LLM.

To address the above challenges, we opted to use obfuscated Powershell payloads used as droppers of Emotet malware. Emotet is a notorious group that operates under the malware as a service model. The group performs a spamming email campaign that delivers malware (malspam) to distribute malicious MS Office documents that act as droppers to download and execute the binary of the malware. Thus, the trojanised MS Office document uses VBA to download an executable from a set of predetermined URLs, usually compromised WordPress webpages, and then executes it. To achieve this, they use Living Off The Land Binaries, Scripts and Libraries (LOLBAS). Practically, Microsoft, in its operating systems has integrated an additional measure for executable files, to execute a file the user has to provide direct consent through the graphic interface. The response can be stored and reused in future sessions to avoid friction and user fatigue. However, this does not apply to executables that Microsoft has digitally signed. Thus, tools and programs that bear its signature, e.g., Regsvr32.exe and Winword.exe, may launch any program without the user’s consent or notification. Malware like Emotet exploit LOLBAS to download and execute malicious payloads, so everything is performed in the background. The modus operandi of Emotet is illustrated in Figure 1. To make things even more complex, the dropper uses an obfuscated PowerShell encoded in base64. Figure 2 illustrates a sample of obfuscated and deobfuscated PowerShell scripts. As observed, beyond breaking many strings into shorter ones, all variables have random names, and there are several string replacements and unused variables.

Once the host is infected with Emotet, Emotet will try to discover sensitive information, e.g. credentials, find other hosts on the network, make email hijacking to infect other victims’ connections and connect the host to the botnet. Then, Emotet typically executes another malware, e.g., Qakbot, Dridex, Ryuk, or TrickBot, as part of the malware as a service scheme. Therefore, Emotet capitalises on its botnet by getting a share of the ransom or sharing the resources of the compromised hosts with other threat actors. An international effort coordinated by Europol and Eurojust disrupted Emotet at the beginning of 2021, took down its infrastructure and later disinfected compromised hosts. However, Emotet resurrected several months afterwards and has not stopped its activity ever since, recurrently pushing new campaigns. The modus operandi remains more or less the same, but the droppers have been extended, as beyond MS Word, there are Excel and OneNote files, too.

To conduct the experiment that matches the above criteria, we based our methodology on the pipeline we developed in [26] to analyse Emotet’s malicious documents. The pipeline is depicted in Figure 3. We used a Linux VM and Viper Monkey[3] to extract and deobfuscate the VBA code. As a result, we collected the obfuscated Powershell code that was base64 encoded. After decoding it, we parsed it with PWSH; Microsoft’s implementation of Powershell for Linux systems through Python. The above environment proved very efficient and scalable, preventing leaks and bypasses. Indeed, it took one day on an Intel i7 PC with 16GB of RAM to analyse more than 30,000 unique malicious documents.

Due to budget constraints for querying the paid APIs, we used 2,000 random obfuscated Powershell scripts and the URLs these scripts were communicating to download Emotet’s binary. The functionality of these scripts is very straightforward. Practically, each script creates a file with a random name in a folder with another random name, which will store a binary that it will download from the Internet and

execute it. To this end, it has an array of several URLs that refer to compromised domains; usually WordPress sites, where the perpetrators host their binaries. For persistence, the list contains, on average, 6.6 URLs from different domains. The script would iterate this list and try to download the content from each of them. Once one of them returns a stream with a significant length, the dropper stores it and executes it. Therefore, even if some URLs are taken down, the script will proceed to the next until one URL is available. This way, the Emotet group increased its chances of not having all of its dropper sites taken down by not revealing them simultaneously. In our dataset, we used 2,000 unique documents and used their corresponding PowerShell scripts. Thus, we have 2,000 obfuscated Powershell scripts, which, when deobfuscated, refer to 2869 unique URLs belonging to 2512 unique domains. In terms of scale, our samples constitute approximately 5% of Emotet’s campaign.

This paper is available on arxiv under CC BY-NC-SA 4.0 by Deed (Attribution-Noncommercial-Sharealike 4.0 International) license.


[3] https://github.com/decalage2/ViperMonkey


Written by deobfuscate | Bringing simplicity to the obscure, revealing the truth.
Published by HackerNoon on 2025/04/22