Hello everybody, today is about speed improvements and new malware samples in malwarestats.org. If you followed the MalwareStats.org genesys you might remeber the early stage development where took between 8 to 10 minutes to visualize statistics over 43k Malware Analysis. Today it runs much better alost 15 seconds to visualize 76.2K Malware Analysis (ok, I know.. it really depends on Network speed and Computation power… but tested on the same machine you might experience a hug performance gap).
Let me just remind you what MalwareStats.org is about:
“The continued growth in number and in complexity of malware is a well established fact. Malwares are no longer simple pieces of code that rely on unsuspecting users to spread and thrive. They can change, adapt and hide themselves from analysts, using very sophisticated techniques. Static analysis is complex and time consuming, and it could be difficult to deduce every possible malicious behaviour, yet it is often very effective because it hinders the capability of malware to detect the analysis environment. The purpose of MalwareStats.org is to provide valuable assistance to the phase of static analysis, supporting analysts in their exploration of code features, by letting them make more focused, statistically motivated and structured decisions.”
We are facing a “Big Data” problem. Thousands of samples produce Hundred Thousands of results, which end up to be Giga Bytes of well structured Text. And.. yes, I want to make general tatistics so far (general !== from “time frame defined”) so I am not interested on filtering data (well..I know I will end up putting a time filter on the main page.. but not today!). My main goal is to answer in the quickest way to such a questions: ” What are the most used packers ?” or “What are the most used evasion techniques?” or again “What are the most used API or Anti-Debbugging Techniques?” and so on and so forth. Obviusly I want to give such statistics by using a simple and intuitive web interface. You might wonder why those questions are so important for me !? Well, because they really drive my decisions during a romantic Malware analysis.
The following image shows the today stats on MalwareStats.org:
The new and simple algorithm (which is not the best I can create and it is not remarkable in any point but it made a huge improvement) which made possible the huge visualization improvement from the last two versions is available here. The following image shows the principal code function responsible to build the output, before passing it to google graphs.
Simple Visualization Algorithm
As you might agree with me the entire code should be protected (which is not protected on undefinition, null pointers, etc..) and even improved in speed introducing multiple web workers. If you like to be involved in that project just drop me an email, any suggestion is welcomed as well. Enjoy the new results !