Why Machine Learning Is a Critical Defense Against Malware

Robot writing mathematical formulas on a blue screen

Dr. Sven Krasser, CrowdStrike® chief scientist, has called machine learning (ML) the first line of defense against modern threats. That statement, which he made a couple of years ago, is even truer today. This truth is reflected in the CrowdStrike Services Cyber Intrusion Casebook, where CrowdStrike’s incident responders report that commodity malware is often a precursor to a disruptive attack.

In addition, as reported in the 2019 Global Threat Report, during the past year, the CrowdStrike Intelligence team has observed an increase in malware-based attacks, and a corresponding drop in stealthier malware-free intrusions. On its face, this may seem like good news, however, the details reveal a troubling situation: 40 percent of incidents in 2018 were the result of malicious software that went undetected by traditional antivirus (AV). This means that organizations relying on legacy solutions are alarmingly vulnerable to malware-based threats.

Why Is Malware Getting Past Traditional AV?

Traditional AV relies heavily on signatures, or virus definition files, to identify and block malware. This means that to be prevented, a new malware variant must be discovered, then a signature for it must be created, and finally, that signature must be deployed to the endpoints. This process opens a time gap between the initial use of the malware and the availability of a signature to block it. That gap gives the attackers sufficient time to successfully initiate an attack or steal credentials they can use later.

Because today’s attackers expect to find some type of AV protection on their target, they are constantly crafting new techniques to bypass anti-malware protection. A common and easy technique is to modify, or morph, known malware into a zero-day variant for which no matching signature exists. To achieve this, attackers use tools such as packers to evade detection by constantly changing or obfuscating the malware’s true nature. In fact, this is so easy to do that according to av-test.org, a staggering 390,000 new pieces of malware appear every day, making it impossible for signature-based technologies to keep up.

ML Is a Better Defense Against Malware

This massive volume of new malware is why an effective anti-malware solution needs to be great at detecting known malware, but also capable of preventing unknown or zero-day malware. This is where ML can be most valuable and serve as an effective tool against both known and unknown malware.

ML works because it can understand and identify malicious intent based solely on the attributes of a file without prior knowledge of it, without signatures and without needing to execute the file to observe its behavior. When well designed, ML can be an extremely effective weapon against malware. The CrowdStrike ML engine, for example, was able to block Shammon2, WannaCry and NotPetya out of the box, without any updates. The engine, which has been regularly submitted for independent public testing, achieved 99.5 percent detection rates.

Look for the Right ML to Protect Your Endpoints

This makes machine learning a minimum requirement for any effective endpoint solution. Therefore, products that rely on legacy signature-based techniques alone, whether they use their own AV engine or OEM someone else’s, should be ruled out automatically, even if they claim to be “next-generation.” Those products provide the same incomplete malware protection as traditional signature-based engines.

Where Is ML Located?

Another factor to watch for is the location of the ML engine. The need for ML is so strong that the majority of vendors now claim that they use machine learning. The key here is to find out where the engine resides. If ML is only in the cloud, the endpoint won’t be protected when offline, opening another gap in protection. This is why a machine learning engine needs to reside on the endpoint itself to offer full protection.

How Is ML Trained?

Last but not least, not all machine learning models are created equal. A poorly trained model will produce incorrect predictions, generate a flurry of false positives, and as a result, undermine protection efficiency. Inquiring about false positives and running tests might be a good way to get a sense of an ML engine’s efficiency.

ML Is an Indicator of Security Effectiveness

An organization’s ability to protect against malware is a good indicator of the effectiveness of its entire security strategy. If its systems can be compromised by commodity malware, then what could a more sophisticated attacker do? Even though ML is essential, no one should rely on ML alone to protect endpoints. It is necessary to implement a comprehensive endpoint security solution that includes ML but also combines complementary technologies, such as exploit prevention and behavioral analysis. This will provide the ability to protect against all types of attacks whether malware is used or not.

Additional Resources

Related Content