Inside CrowdStrike’s Science-Backed Approach to Building Expert SOC Agents

A deep dive into the process of building, training, testing, and refining AI agents capable of operating in a modern SOC.

Security teams are at a critical inflection point. AI-enabled adversaries now operate at machine speed, automating phases of the kill chain and scaling attacks faster than human-only workflows can respond. Yet most SOCs still depend on manual triage and investigation processes that cannot keep pace. 

This has fueled an explosion of SOC agents across the cybersecurity landscape. The appeal is obvious: Triage and investigation are among the most intuitive and consequential tasks for AI to transform because the first decisions in the lifetime of a detection can shape the entire trajectory of the response. Just as in medicine, inaccurate triage can lead to misallocated resources, delayed intervention, or signals going unnoticed during critical response windows. 

With so many SOC agents entering the market, organizations are struggling to determine which agents have the proven accuracy to meet the demands of today’s high‑stakes SOC environment. 

This blog breaks down why only a science-backed approach to agent training, testing, and refinement can deliver agents worthy of operating in the SOC — and how this principled methodology underpins CrowdStrike® Charlotte AI™ Detection Triage and Response agents. 

Should All SOC Agents Be Trusted? 

For most, the answer is no. Many “agents” are little more than simple automated workflows that call off-the-shelf large language models (LLMs) — systems that mimic expertise rather than embody it. They may look impressive in demos, but without expert grounding, scientific benchmarking, or rigorous feedback loops, their accuracy collapses under real-world conditions. They’re inconsistent, unpredictable, and risky. 

Security teams don’t need agents that sound confident. They need agents that meet the bar required for an agentic SOC: Agents trained on real analyst judgment and capable of analyst-grade decision-making, not just surface-level pattern recognition. They need agents that are transparently tested, measured, continuously refined, and governed by strict guardrails for safe, consistent orchestration. 

The Six Pillars of Building Mission-Ready Agents 

To make an agent worthy of operating in the SOC, six criteria must be met: training on expert-annotated data, measurable and transparent benchmarking, continuous monitoring and feedback loops, a purpose-built architecture for enterprise scale, stringent guardrails, and adversarial robustness.

Diagram showing the six criteria for building enterprise-grade security agents

Pillar #1: A Rich Corpus of Human-refined Data  

Training an agent to make security decisions requires a fundamentally different approach than training machine-learning models that classify malware or score vulnerabilities. Cultivating analyst-grade decision-making requires expert judgment. To teach an agent why a decision was made, not merely what happened, requires human-annotated data that captures how analysts interpret context, evaluate subtle signals, and analyze adversary tradecraft. This depth of insight cannot be scraped, synthesized, or generated by an LLM.

CrowdStrike’s Expert-Data Advantage: CrowdStrike is the only security company with over a decade of decisions made by Falcon Complete Next-Gen MDR, our globally scaled managed detection and response (MDR) team, widely regarded by customers and analysts as the most elite defenders in the world. Every triage outcome, investigative pivot, and interpretation of adversary behavior enriches and expands our corpus of expert-refined data. 

We continue to generate new, high-fidelity training data every day. Falcon Complete analysts review Charlotte AI’s outputs as they use its agents in their daily workflows. These expert-AI interactions flow into a growing corpus of training data. All of this is anchored by the CrowdStrike Falcon® platform, which recently delivered a flawless 100% performance in the 2025 MITRE Engenuity ATT&CK® Evaluations. 

In combination with the Falcon platform’s trillions of correlated cross-domain events and world-class threat intelligence, this dataset becomes something no other vendor can replicate: a living model of how the world’s most proficient analysts operate in the face of real intrusions. This enables Charlotte AI’s agents to operate with analyst-grade depth and consistency.

Pillar #2: A Science-backed Approach to Benchmark Agent Performance 

In security, an agent that cannot be measured cannot be trusted. To run safely in production, teams must know how often an agent is right, under what conditions, and how its performance shifts as models or adversaries evolve. Yet most “SOC agents” today offer no such visibility — their vendors cannot demonstrate accuracy, validate behavior, or show how these systems perform in real-world environments. Without objective benchmarks and repeatable evaluation methods, there is no way to know how an agent is behaving in production.  

CrowdStrike’s Data Science Advantage: CrowdStrike’s expert agents are built on hard science, anchored in reproducible benchmarks and rigorous evaluation. For instance, Charlotte AI’s Detection Triage agent and Agentic Response agent are tested against the decisions of Falcon Complete, scored for accuracy, and continuously validated as new models emerge. This creates a transparent, explainable benchmark that customers can rely on, even as adversaries evolve and the model landscape shifts.

Pillar #3: Reinforcement Learning that Drives Continuous Improvement 

Building a production-grade agent requires not just great data or great models, but investing in ongoing human-validated improvement. Without continuous evaluation, reinforcement, and correction, an agent’s accuracy will inevitably drift. Accuracy is maintained through a systematic feedback loop in which experts review agent decisions, identify errors or blind spots, and feed those insights back into the training corpus. With every iteration, accuracy increases, performance stabilizes, and confidence grows.

CrowdStrike’s Feedback Advantage: CrowdStrike operates a large-scale, expert-driven feedback loop for Charlotte AI’s Detection Triage and Agentic Response Agents. Falcon Complete analysts continuously review, validate, and score agent decisions during real intrusions, creating the high-quality reinforcement data needed to correct performance, detect drift, and ensure agents evolve alongside adversary tradecraft.

This unique feedback cycle compounds over time. As Charlotte AI offloads work, analysts respond faster and reallocate attention to higher-value detections, generating more expert-labeled data to fuel the next round of training. The result is an accelerating accuracy flywheel: Agents improve, analysts become more efficient, and each cycle produces richer data to strengthen future iterations. No startup, legacy vendor, or even model provider can replicate this exact “hill-climbing” methodology, because no other vendor has the elite managed services organization, surgical focus on cybersecurity, massive scale, and deeply integrated platform telemetry that CrowdStrike brings. It is a structural advantage that ensures Charlotte AI’s agents maintain accuracy, reliability, and safety as threats and models evolve.

Breaking the Sensitivity-Noise Tradeoff: A transformative implication of this feedback loop is that detection engineering itself becomes more powerful over time. Historically, SOC teams have been constrained by the sensitivity-noise tradeoff: increase detection sensitivity (catch more threats but create more noise or false positives) or increase precision (reduce noise but risk missing subtle threats). 

But in the agentic SOC, if triage can be performed with reliable accuracy at near-unlimited scale, that constraint disappears. Security teams can safely increase detection sensitivity and surface weaker behavioral signals, knowing SOC agents will triage the noise. This replaces the precision-recall balance with a continuum of resource investment, dramatically expanding defensive coverage and raising catch rates — a transformation simply not possible without accurate, continuously improving agentic triage.

Pillar #4: An Architecture Built for Enterprise Scale and Performance 

For AI agents to be effective in production, they require a foundation engineered for accuracy, latency, and cost-effectiveness as usage grows — all while meeting regulatory and compliance requirements. This requires a flexible, customizable approach where each individual agent is engineered for its specific role and designed with the entire system in mind, rather than being needlessly uniform or rigid. 

For instance, an architecture where each agent is powered by a frontier LLM might be misaligned with the nature, scale, and frequency of tasks in a SOC. This would be prohibitively expensive to operate at volume, too slow for time-sensitive workflows, and prone to bottlenecks that can jeopardize reliability for core business operations. Agentic security demands a foundation that can support thousands of concurrent, diverse tasks without sacrificing performance, resilience, or governance. 

CrowdStrike’s Architectural Advantage: Charlotte AI’s architecture is a dynamic, heterogeneous system where the design of each agent is optimized for the job it needs to do. Instead of standardizing on a single model provider or class of models, Charlotte AI’s architecture allows CrowdStrike’s data science team to employ the best-suited technology for each agent (LLMs, machine learning models, rules, etc.). 

For example, for some agents, we’ll use small language models (SLMs), which provide low-latency, cost-optimized building blocks trained for high-volume, well-defined security tasks. In other cases, when agents need to perform complex, cross-context reasoning, we may use more resource-intensive and tuned frontier models. This modular design gives us the power to continuously reassess and optimize every agent with the best technologies as they emerge. The result is an agentic security architecture that is high-performing, massively scalable, and built to keep evolving.

Pillar #5: Guardrails for Safe, Responsible AI Adoption  

In the SOC, an agent that takes the wrong action, escalates the wrong threat, or misinterprets context can create operational risk. Effective agentic security requires predictability, transparency, and human oversight. Agents must operate with clear guardrails so analysts understand what the agent did, why it did it, and how to override or refine its behavior. 

CrowdStrike’s Governance Advantage: Charlotte AI is engineered with built-in guardrails that keep humans firmly in control. Every action an agent takes is bound by role-based access controls, bounded autonomy policies that enable analysts to define what gets automated and when, and a fully auditable record of its decisions. Each recommendation includes transparent, source-linked explanations so analysts can validate its logic and make informed decisions.

This governance model ensures that human oversight is always preserved. Analysts decide where autonomy is allowed, when approvals are required, and how far an agent can go. These safeguards make Charlotte AI both powerful and suitable for production environments, enabling organizations to adopt agentic security with confidence.

Pillar #6: Adversarial Robustness in the Agentic Wild West

In cybersecurity, AI agents don’t operate in a lab. They face intelligent, adaptive adversaries  constantly attempting to probe and attack them. Polished demos cannot predict how agents behave under scrutiny. In this agentic wild west, static defenses will quickly decay, prone to subversion from prompt injection, data poisoning, and evasion. Robust agentic security requires a different mindset: designing for adversarial pressure from Day One, continuously monitoring real-world behavior, and hardening agents as part of an ongoing operational security practice.

CrowdStrike’s Hardened Agent Advantage: Charlotte AI is built with this adversarial reality in mind. Our agents are battle-tested against real-world tradecraft. CrowdStrike’s data science, engineering, and threat-hunting teams work together to stress-test agent behavior, tighten decision boundaries, and add layered safeguards when we see new evasion techniques in the wild. In today's adversarial environment, robustness isn’t a one-time box you tick at launch — it’s a continuous, data-driven hardening process. With an unrivaled understanding of the adversary, CrowdStrike is uniquely positioned to keep agents resilient against real-world attacks.

How Charlotte AI’s Expert Agents Analyze Detections with >98% Accuracy

Charlotte AI’s Detection Triage and Agentic Response Agents show what’s possible when expert data, scientific benchmarking, continuous feedback, platform integration, and governance converge. Together, they deliver analyst-grade decision-making with near-perfect verdicts, helping analysts reclaim time, respond with more consistency, and prioritize critical threats. 

Charlotte AI’s Detection Triage Agent

Triage is one of the most crucial, and inconsistent, activities in security operations. Analyst experience, training gaps, and alert fatigue all contribute to variability — and in security, inconsistency begets risk. An incorrect triage assessment can bury a true threat in a backlog, while an unnecessary escalation can pull analysts away from genuine threats.

Charlotte AI’s Detection Triage Agent delivers consistent, high-fidelity triage at machine speed. Within moments of a detection being generated, Charlotte AI’s Detection Triage Agent automatically gathers all relevant Falcon platform telemetry, processes related context, and provides a verdict, confidence level, prioritization score, and recommended next step (whether to close out or escalate for review). It pairs every decision with a detailed explanation of its judgment, giving analysts immediate clarity and reducing documentation overhead.

With over 98% decision accuracy across endpoint, identity, and cloud detections,1 Charlotte AI helps ensure no critical signal gets buried — while saving teams at least 5 minutes per detection2 and eliminating analyst time spent on noise.

Automating triage sets the course for a faster and more streamlined response: ensuring every detection gets reviewed, producing higher-fidelity queues, reducing analyst fatigue, and mitigating the risk of human inconsistency. This frees elite analysts to reallocate their attention to complex, high-value, and novel threats — transforming the SOC from reactive alert-processing to proactive defense.

Figure 1. Charlotte AI performs detection triage analysis on a newly-issued endpoint detection, delivering a true positive verdict, an escalation recommendation, and an explanation. Figure 1. Charlotte AI performs detection triage analysis on a newly-issued endpoint detection, delivering a true positive verdict, an escalation recommendation, and an explanation.

Charlotte AI Agentic Response 

Once a detection is escalated, users can choose to invoke Charlotte AI Agentic Response, which drives investigations forward with the same expert judgment used for initial triage — delivering an additional ≥10 human-equivalent minutes saved per credit spent.3 It identifies what matters, generates the questions a seasoned analyst would ask, assembles answers using a customer’s Falcon platform modules, and synthesizes findings into a clear, intuitive summary.

Now Available: Agentic Response Collaboration: Every organization operates differently, with unique policies, context, and institutional knowledge that no single data set can anticipate. Moreover, as tradecraft evolves to span multiple domains, effective investigations require cross-domain intelligence and expertise. That’s why we’ve developed and released Agentic Response Collaboration, a new capability that turns the Agentic Response canvas into a centralized command plane for human-AI and multi-agent collaboration during investigations. 

Analysts and authorized partner integrations can now inject context, add questions, correct or refine answers, and reprioritize lines of inquiry, enriching investigations with their organizational expertise. This fusion of human insight and autonomous intelligence produces investigations that are more accurate, more dynamic, and far more efficient. Analysts remain firmly in control as Charlotte AI accelerates analysis, ensuring the final outcome reflects both world-class intelligence and the unique realities of the customer’s environment.

Figure 2. Charlotte AI enables security analysts to guide agentic investigations as they unfold, enabling analysts to prioritize or deprioritize questions. Figure 2. Charlotte AI enables security analysts to guide agentic investigations as they unfold, enabling analysts to prioritize or deprioritize questions.
Figure 3. With Agentic Response Collaboration, security analysts can also add or answer questions in the investigation canvas. Figure 3. With Agentic Response Collaboration, security analysts can also add or answer questions in the investigation canvas.

Why This Matters Now

The leaders of the agentic SOC era won’t be defined by who ships the most agents but by who can build agents that can operate accurately, consistently, and safely in real-world conditions. Enterprise-grade agents must demonstrate analyst-grade judgment, adapt to evolving tradecraft, and scale as needed. Achieving this standard requires transparent benchmarking, continuous human feedback, a purpose-built platform, rigorous guardrails, adversarial hardening, and the world-class data required to train expert decision-making. 

Charlotte AI’s Detection Triage and Agentic Response Agents embody this standard. They deliver expert-level verdicts and measurable accuracy at machine speed, all grounded in the richest corpus of analyst judgment in cybersecurity. They do not simply accelerate analysts — they elevate them, transforming how teams detect, investigate, and respond to threats under real-world pressure. 

Additional Resources

1 Accuracy rating is a measure of Charlotte AI triage decisions that match the expert decisions from the CrowdStrike Falcon Complete Next-Gen MDR team.

2 Time savings represents the amount of time an analyst would have spent triaging detections but can now use that time for other skilled work while Charlotte triages the detections. Individual results may vary based on factors such as total alert volume.

3 Time savings is an estimate based on Agentic Response’s ability to automate tasks that would otherwise require more than 10 minutes of manual effort by a human analyst. This should not be interpreted as a guarantee that this will lead to a 10-minute reduction in the total investigation time or mean time to respond (MTTR).