Machine learning (ML) is a great approach to detect Malware. It is widely used among technical community and scientific community with two different perspectives: Performance V.S Robustness. The technical community tries to improve ML performances in order to increase the usability on large scale while scientific community is focusing on robustness by meaning how easy it would be to attack a ML detector engine. Today I’d like to focus our attention a little bit on the second perspective pointing up how to attack ML detector engines.

We might start by classifying machine learning attacks in three main sets:

  1. Direct Gradient-Based Attack. The attacker needs to know the ML Model. The attacker needs to know model structure and model weights in order to make direct queries to the Machine Learning Model and figure out what is the best way to evade the it.
  2. Score Model Attack. This attack set is based on the score systems. The attacker does not know the Machine Learning Model nor its own weights but he has direct access to the detector engine so that he can probe the machine learning model. The model will return a score and based on such a score, the attacker would be able to guess how to minimise it by forcing specific and crafted inputs.
  3. Binary Black Box Attack.  The attacker has no idea about the Machine Learning Model and the applied Weights, he has also no idea about the scoring system but he have unlimited access to probe the Machine Learning Model. 
Direct Gradient-Based Attack
Direct gradient based attack could be implemented in at least two ways. A first and most used way, is to apply small changes to the original sample in order to reduce the given score. The changes must be limited to a specific domain, for example: valid Windows PE file or  valid PDF files, and so forth. The changes must be little and they should be generated in order to minimise a scoring function derived by weights (which are know fro Direct Gradient-Based Attack). A second way is to connect the targeted model (the mode which is under attack) to a generator model in a generative adversarial network (GAN). Unlike the previous set, the  GAN generator learns how to generate a complete new sample derived by a given seed able to minimise the scoring function. 
I.Goodfellow et Al. in their work “Explaining and Harnessing Adversial Examples” (here) showed how little changes targeted to minimise the resulting weights on a given sample X would be effective in ML evasion. Another great work is written by K.Grosse et Al. titles: “Adversial Perturbations against deep neural networks for malware classification” (here). The authors attacked a deep learning Android malware model, based on DREBIN Android Malware data set, by apply a imperceptible perturbation on the feature vector. They had very interesting results getting from 50% to 84% of evasion rate.   I.Goodfellow et Al. in their work titled “Generative Adversial Nets” (here) developed a GAN able to iterate a series of adversarial rounds to generate samples that were classified as “ham” from the targeted model but that really were not. The following image shows a generative adversarial nets are trained by simultaneously updating the discriminative distribution (D, blue, dashed line) so that it discriminates between samples from the data generating distribution (black,dotted line) px from those of the generative distribution pg (G) (green, solid line).

Image from: “Generative Adversial Nets”

Score Model Attack

The attacker posture on that attack set is considered as “myope”. The attacker does not know exactly how the ML model works and he has no idea about how the weights changes inside the ML algorithm but he has the chances to test his sample and getting back a score so that he is able to measure the effect of the input perturbation.
W. Xu, Y. Qi and D. Evans in their work titled: “Automatically evading classifiers” (here) implemented a “fitness function” which gives a fitness score of each generated variant. A variant with a positive fitness score is evasive. The fitness score holds the logic behind the targeted model classified as benign the current sample but retains a malicious behaviour. Once the sample gets high fitness score it is used a seed into a more general genetic algorithm which starts to manipulate the seed in order to make different species. To assure that those mutations preserve the desired malicious behaviour according to the original seed the authors used an oracle. In that case they used cuckoo sandbox.
Image from: “Automatically evading classifiers”
After one week of execution the genetic algorithm found nearly more then 15k evasive variants from 500 circa malicious seeds, getting the 100% of evasion rate on PDFrate classifier.

Binary black-box attacks

Binary black-box attacks are the most general one since attacker does not know anything about the used model and the anti malware engine just says: True or False (it’s a Malware or it is not a Malware). In 2017 W.Hu and Y.Tan made a great work described in “Generating Adversarial Malware Examples for Malware Classification” (here). The authors developed MalGAN an Adversial Malware generator able to generate valid PE Malware to evade static black-box PE malware engine. The idea behind MalGAN is simple. First the attacker maps the Black-Box outputs by providing specific and Known Samples (Malware and Good PE). After the mapping phase the attacker builds a Model that behaves as the black-box Model. It is a simple Model trained to behave as the targeted one. Then the built Model is used as target model in a gradient computation GAN to produce evasive Malware. The authors reported 100% efficacy in bypassing the target Model. H. S. Anderson et Al. in “Evading Machine Learning Malware Detection” (here) adopted a Reinforced Learning Approach. The following image shows the Markov decision process formulation of the malware evasion reinforcement learning problem.

Image from: Evading Machine Learning Malware Detection


The agent is the function who manipulate the sample depending on the environment state. Both the reward and a the state are used as input from the agent in order to get decisions on next actions. The agent learns by the reward which depends about the reached state. For example the reward could be higher if the reached state is close to the desired one or vice-versa. The authors use a Q-Learning technique in order to underestimate a negative reward given for an action which would be significant in medium long term.

“In our framework, the actions space A consists of a set of modifications to the PE file that (a) don’t break the PE file format, and (b) don’t alter the intended functionality of the malware sample. The reward function is measured by the anti-malware engine, which is converted to a reward: 0 if the modified malware sample is judged to be benign, and 1 if it is deemed to be malicious. The reward and state are then fed back into the agent.”

Final Considerations

Machine Learning, but more generally speaking Artificial Intelligence, would be useful to detect Cyber Attacks but unfortunately – as widely proved on this post – it would not be enough per se. Attackers would use the same techniques such as Adversarial Machine learning  to evade Machine Learning detectors. Cyber Security Analysts would still play a fundamental role in Cyber Security Science and Technology for many years from now. A technology who promises to assure cyber security protection without human interaction is not going to work.