A Primer on Machine Learning in Endpoint Security

ML Blog Image

Although machine learning (ML)  isn’t a new concept —  it actually dates back to the origins of the computer itself —  it has received more attention lately as cybersecurity vendors increasingly extol its efficacy in dealing with signatureless or unknown malware. Today, many endpoint security offerings claim to include ML-based technology in their products, but are ML engines all the same?  And if they aren’t, how can customers make informed decisions about what constitutes effective ML?  There are guidelines that can help you make the right decision, but first, it’s important to understand what ML is and why it’s become such an important topic of conversation among vendors, analysts and customers.

What Is Machine Learning?

As a subset of the broader field of artificial intelligence, a basic definition of machine learning is that it teaches a machine how to answer a question or how to make a decision on its own. Unlike traditional programming, which requires giving a machine explicit instructions on how to answer a specific question, machine learning must have access to every imaginable case so that it can decide the correct answer. The significance of this capability and the reason ML is so important is that it it can teach a machine how to predict the answer to a question it has never encountered before, replacing processes that would otherwise require arduous and protracted human analysis. ML learns by being fed multiple, relevant examples in the form of a dataset —  the more examples it gets, the better it can learn.

Types of Machine Learning

The ML process requires that datasets (input) be fed to a machine so that the best possible answers (output) can be achieved. There are three main types of ML, which operate differently depending on how each is being used. The first type is supervised learning, where input is labeled and the machine knows the expected outcome. This type has many use cases, including speech recognition and disease diagnosis. A second type is unsupervised learning, where neither the input nor the output is labeled and the machine must make decisions based solely on analyzing patterns and structure. You find this type behind the classification of movie genres on Netflix, for instance. The third type is reinforcement learning, where the machine interacts with its environment to achieve a certain goal. It is similar to unsupervised learning except the input is unlabeled. It’s typically used in applications such as robotics for manufacturing.

Finding the Right Machine Learning for Malware Detection

The model that’s most appropriate for endpoint security is supervised learning, the ideal type for detecting malware. With supervised ML, the machine must be fed healthy amounts of relevant datasets so it can learn. There should be enough data so that results are meaningful, and output must be adjusted to ensure the right balance of true to false positives. Through many rounds of training and testing, parameters can be fine-tuned to ensure the most accurate ML model possible for detecting unknown malware. But not all ML engines are the same — particularly when applied to an endpoint protection platform (EPP). While many EPP vendors claim to have ML capabilities, the results they deliver can vary widely. There are some  important features to look for in effective ML for EPP:

Massive data sets: Most endpoint security ML is based on supervised learning so the question becomes, “how good is that supervised learning?” First, to be effective, ML must have enough relevant data with which to work. Second, it must be able to implement sufficient rounds of training with speed and efficiency. ML without enough data, or requiring frequent rounds of training, can negatively impact results.

More than a yes or no answer:  Most endpoint security products that claim to have ML-based technology deliver a simple yes or no answer when scanning malicious file —  “yes,” the file is malware, or “no,” it isn’t. However, you need more information to ensure the most effective use of your IT resources.  If “yes” indicates a benign issue, you may not want to immediately devote valuable resources to resolving it. On the other hand, having information about the severity of a threat allows you prioritize and act immediately, preventing a larger problem from developing. 

Detecting unknown malware with fewer false positives: Anti-malware tools that rely on signatures must be updated frequently for them to be effective. However, a signatureless ML engine can “generalize,” which means instead of having to memorize a set of specific malware file signatures, ML can learn without having to be fed new datasets every day. ML analyzes higher-level traits to decide if a file is malicious — a far superior approach for detecting today’s targeted, unknown malware. This approach enables  ML to find the unknown malware other solutions miss without generating a slew of false positives, which can drain valuable IT resources and lead to alert fatigue.

Even the Best Machine Learning Can’t Work Alone

Finding the right ML is important, but you shouldn’t rely on ML alone to protect your endpoints. Unfortunately, if an adversary targeting your organization is persistent enough, he will eventually get a piece of malware past even the best ML-based defenses. Having comprehensive endpoint protection –  that includes ML but also offers exploit prevention and behavioral analysis – should be an integral part of the solution you choose. The ability to detect subtle signs of an attack, based on analyzing event behaviors and determining an attacker’s intent, is also a critical function that provides protection regardless of the malware or exploit used in the attack.

CrowdStrike’s Approach to Machine Learning

CrowdStrike Falcon® integrates signatureless ML into Falcon endpoint protection, powered by the CrowdStrike Threat Graph™, to deliver unmatched effectiveness:

  • CrowdStrike Threat Graph processes more data per day than any other endpoint security company —  40 billion events per day at the time of this writing.
  • As a cloud-native solution built on a graph database, CrowdStrike® leverages massive computing power enabling it to constantly retrain its models without having to schedule or queue rounds of training, resulting in superior machine learning efficacy.
  • Superior ML technology means fewer false positives and the ability to detect and mitigate unknown malware faster.
  • CrowdStrike’s is the first ML engine to be incorporated into VirusTotal, which benefits the security community and also ensures CrowdStrike’s ML accountability.
  • CrowdStrike ML continues to evolve with multiple engines available now, and new ML engines being tested for future releases.
  • The CrowdStrike Falcon platform goes beyond ML to include exclusive features such detection of indicators of attack (IOAs), exploit mitigation, and managed threat hunting with cloud-native delivery that protects your endpoints whether they’re online or offline.

To learn more, download the white paper “The Rise of Machine Learning in Cybersecurity.”

CrowdStrike Falcon Free Trial

Try CrowdStrike Free for 15 Days Get Started with A Free Trial