One year ago I decided to invest in static Malware Analysis automation by setting up a full-stack environment able to grab samples from common opensources and to process them by using Yara rules. I mainly decided to offer to cybersecurity community a public dataset of pre-processed analyses and let them be “searchable” offering PAPI and a simple visualization UI available HERE. Today after one year is time to check how the project grown and see its running balance.
How it works
Malware Hunter is a python powered project driven by three main components: collectors, processors and public API. The collector takes from public available sources samples and place them in a local queue waiting to be processed. The processors are multiple single python processes running on a distributed environment which pulling samples from the common queue, process them and saves back to a mongodb instance the whole processed result-set. Unfortunately I cannot store all the analyzed samples due to the storage price which would rise quite quickly, so I manly collect only “reports”: a.k.a what the processor has analyzed. Public API (if you want access to them, please write me) are used to query the mongodb and for submitting new samples. Finally a simple user interface is provided for getting some statistics over time. A periodic task looks for specific matched Yara rules which stands for APT and populates a specific view available HERE. In other words if it finds a match on a well-known Yara rule built to catch Advanced Persistent Threats, it collects the calculated hash and through a dedicated API visualizes stats and matching signatures of all potential APT found. Everything runs without human interaction, and this is good and bad as well. It is good since it can scale up very quickly (depending on the acquired power on the VPS) but on the other hand by having no human interaction, depending on the implementation of Yara rules, the system can have many false positives. BUT what is interesting to me is having a base set of samples from which starting on. Without such a kind of tool, how would you start finding specific threats ? In that specific case I can start from the one matched the specific signature related to the specific threat I want to follow, rather than starting randomly.
After almost one year of fully automated static analyzed samples through Yara rules, Malware Hunter analyzed more than one Million samples, distributed in the following way.
It looks like on April 2019 the engine extracted and analyzed a small set of samples if compared to the general trend, while on late August / first of September it analyzed more than 250k samples. It is interesting to see a significant increase of analyses at the “end of the year” if compared to the analyses performed at the beginning of the same year. While malware collectors collect the same sources over the past year, the engine analyzes only specific file types (such as for example PE and Office files) and assuming the sample sources had not working breaks, it can mean that:
- Malware flow is subject to cyclical trends depending on a multiple topics including political influences and fiscal years.
Observing the most snapped Yara rules it is nice to check that the most analyzed samples were executable files. Many of them (almost 400k) hid a PE file compressed and/or encrypted into themselves.
Many Yara matches highlight an high presence of anti-debugging techniques, for example: DebuggerTiming_Ticks, DebuggerPatterns_SEH_Inits, Debugger_Checks and isDebuggerPresent, and so on and so forth. If considered together with Create_Process, Embedded_PE and Win_File_Operations bring the analyst to think that modern malware is heavily obfuscated and weaponized against debuggers. From signatures such as: keyloggers and screenshots it’s clear that most of the nowadays malware is recording our keyboard activities and wants to spy on us by getting periodical screenshots. The presence of HTTP and TCP rules underline the way new malware keep getting online either for downloading shellcodes (signature shellcode) and to ask to be controlled from a C2 system (such as a sever). Many samples look like they open-up a local communication port which often hides a local proxy for encrypt communication between the malware and its command and control. Crafted Mutex are very frequent for Malware developers, they are used to delay or to manage the multi infection processes.
Another interesting observation comes fron the way Equation Group Toolset matches.
The Equation Group, classified as an advanced persistent threat, is a highly sophisticated threat actor suspected of being tied to the Tailored Access Operations (TAO) unit of the United StatesNational Security Agency (NSA).Kaspersky Labs describes them as one of the most sophisticated cyber attack groups in the world and “the most advanced … we have seen”, operating alongside but always from a position of superiority with the creators of Stuxnet and Flame.From Wikipedia
Many EquationGroup_toolset signatures matched during the most characterized detection time frame (at the beginning and at the ending of the year) alerting us that those well-known (August 2016) tools are still up and running and heavily reused over samples. ShadowBrokers released the code on August 2016 and from that time many piece of malware adopted it, still nowadays looks like be actual in many executable samples.
From the slow but interesting page “potential APT detection” (available HERE) we have “live” stats (updated every 24h) on APT matches over the 1 Million analyzed samples. Dragonfly (As Known As Energetic Bear) is what the Malware Hunter mostly matched. According to MalPedia DragonFly is a Russian group that collects intelligence on the energy industry, followed by Regin. According to Kaspersky Lab’s findings, the Regin APT campaign targets telecom operators, government institutions, multi-national political bodies, financial and research institutions and individuals involved in advanced mathematics and cryptography. The attackers seem to be primarily interested in gathering intelligence and facilitating other types of attacks.
Many Ursnif/Gozi were detected during the past year. Ursnif/Gozi is a quite (in)famous banking trojan targeting UK/Italy mostly, and attribute to the cybercrime group TA-505 from TrendMicro in late 2018 by spotting common evidences between Ursnif/Gozi and TA-505 banking trojans such as Dridex and the loader Emotet. Interesting to note that quite old rules related to Putter Panda hit in some samples (for example:
...). Putter Panda is a Chinese threat group that has been attributed to Unit 61486 of the 12th Bureau of the PLA’s 3rd General Staff Department (GSD). The analysis on the found results might go further, but if you are interesting in getting into some details please do not hesitate to contact me, or to use the search field on that page.
While the scrapers and the workers run in remote and domestic PCs, the PAPI server holds both: Public Application Program Interface and the searching scripts (the ones used to match and to alert for specific API matches). The following graphs show the VP usage at a glance.
I hope you enjoy that tool, as usual feel free to search on samples, to use them to classify your TI and if you need PAPI let me know. I am planning to let it run unless the cost will increase too much for me.