What is cloud incident response?
Cloud Incident Response (Cloud IR) is the process you follow when a cybersecurity incident occurs in your cloud environment. While the cloud aspects of IR will essentially follow typical IR phases (Preparation, Detection and Analysis, Containment, Eradication, and Recovery, and Post Incident / Post Mortem), there are critical differences between cloud platforms (running on AWS, Azure, GCP, Oracle Cloud, etc.). A team of specialist responders and tools can make the difference in getting clear and definitive answers and getting the decision support you need to recover.
How is cloud IR different from traditional endpoint IR?
1. No physical access to servers
In exchange for having the cloud service provider managing data center and network operations and patching (for PaaS), as well as writing and maintaining the software (for SaaS), you sacrifice things incident responders quite often want: physical access to servers to connect to a console, capturing forensic images of hard drives, and the ability to install special diagnostic utilities.
In most cases with cloud environments, you can’t take out-of-band memory dumps, log onto the physical console, or receive actual hard drives from a cloud provider. And that’s OK. You have to work within the access parameters of the cloud provider — such as taking logical storage snapshots, collecting and analyzing the logs the cloud provider makes available, and so on.
2. Accelerated cloud life cycles
The dynamic combination of technologies in modern cloud estates — a mix of containers, functions-as-a-service, hypervisors, and virtual machines running microservices with workloads spinning up and down to meet the changing demands of an organization — make cloud IR a very different environment to the classic capture-an-image, collect-the-logs incident on a typical endpoint.
Without adequate planning and internal communication, forensic artifacts are likely to be lost as workloads that are found “defective” — whether by automated processes meant to keep cloud workloads healthy, or by operations teams who mean well when they rapidly release new versions of a platform to address a vulnerability or exploit — are typically discarded to save costs. If you don’t know where to look, and you haven’t planned to collect logs and retain relevant snapshots, you may find that you don’t have enough information to answer the questions your organization has to ask to contain and recover from a potential breach.
3. “Rogue” operations are almost the norm
New cloud environments can be established by any business unit with access to a credit card, and are often unknown to the central information security organization. While it’s very common that the security organization knows about some core critical operations in the cloud, it’s also very common that adversaries will land their attacks at components of your cloud estate that are peripheral to your awareness, in subsidiaries acquired recently or long ago, in systems and applications that were slated for retirement or were supposed to receive security controls upgrades as part of a later phase of a security plan that didn’t come to fruition in time. These peripheral operations may not be considered core to the organization’s mission, but they can lead to breaches whose tremendous cost greatly exceeds the time and budget allocated to them prior to those breaches.
4. Identity is the new perimeter
Unlike a traditional data center where the perimeter is defined by the network using firewalls, identity has become the true perimeter in a cloud environment. This drastically changes our containment and remediation strategies when facing a cloud data breach. It’s no longer practical to block bad IP addresses, just as it’s no longer possible to assume all authorized users are coming from a particular network.
When user accounts lacking MFA requirements or API keys, commonly stolen from workstations, source control systems, and CICD hosts among others, are discovered by adversaries, they can wreak damage from anywhere on the internet. Adversaries also target accounts synchronized between Active Directory (AD) and Azure Active Directory, meaning an on-prem compromise can easily become a cloud breach as well.
5. Skilled staff are in even shorter supply
Just as technical staff with senior experience in the cloud are in short supply, those with cloud security engineering experience are even harder to find and retain, and those with cloud security incident response experience even more so. For organizations with a presence in more than one cloud platform, this challenge is multiplied.
Understanding cloud log sources
One of the biggest differences between traditional IR on an endpoint and cloud IR is the log sources. Logs are always important in a breach investigation, but the “ground truth” of what is normal and authorized in a cloud environment is harder to construct without logs.
Event logs enable us to piece together the malicious actions executed by a threat actor during a forensic investigation and help us contain and eject that threat actor. But cloud log sources do not align to endpoint log sources, and require different collection tools and techniques. In many cases, cloud logs are not enabled by default or have been purposely disabled and can be complex to retrieve without the right tools.
Without an understanding of these cloud services and tools built using these cloud services, IR firms that do not have dedicated cloud IR specialists will struggle to collect the data needed for an efficient and effective investigation.
In SaaS cases, we always collect authentication logs (sign-in logs) and a targeted set of operations for all users, as well as another set of operations for all users of interest. In infrastructure (PaaS) cases, we collect cloud service plane logs as well as any relevant data plane logs, VPC/VNet flow logs, and any relevant specialty logs (function-as-a-service invocation logs, load balancer or CDN logs, etc.
In both types of investigations, we will collect performance metric data if it’s relevant (especially when more direct evidence sources are lacking) and cloud threat detection alerts for valuable context. Configuration snapshots are not log sources per se but can assist with interpreting the events recorded in the logs.
Furthermore, once you have those logs, interpretation of log entries (or their absence) is not always straightforward. Seasoned cloud investigators know what log trails exist in the different cloud environments (AWS, Azure, GCP) and how to quickly and efficiently collect and interpret the data from them. Log analysis is a key part of cloud compromise assessment.
As organizations try to balance the speed of delivering new digital user experiences to modernize their business processes with the security needs and costs of event logging in their cloud platform, the temptation to minimize or completely disable logging in the cloud environment exists. This becomes problematic in the face of a cloud data breach.
Containing an active cloud threat
Unlike a breach on an endpoint device, you cannot simply unplug the compromised machine and disconnect it from the network in a cloud data breach. And remembering that cloud data breaches can move quickly across your cloud environment, understand that containment is considerably more difficult in a cloud environment than it is with an endpoint breach.
Containing an incident in cloud infrastructure includes identifying all security principals compromised and/or added by the adversary, including users, compromised roles (such as via federated sessions or compromised identity stores), and service accounts. In many cases, the cloud provider supports more than one credential source for a security principal, allowing an adversary to impersonate a user or service account without interfering with the original, authorized purpose for that account — thereby hindering detection.
Some examples of this include multiple API keys in addition to a password for a user, multiple credential sources for a service account, and multiple MFA devices for a single user. All these must be carefully tracked and eliminated while ensuring adequate monitoring to detect any attempts by the adversary to reestablish persistence.
Containment scope doesn’t end there. Automation and deployment frameworks in the cloud platforms, and any user-defined code which can act as a security principal, including functions-as-a-service, containers, and hosts, can be used by the adversary to re-establish persistence in the environment.
Network backdoors including rogue VPC peering points and simple modification to network security group rules can expose resources to continued adversary control. And cloud-based infrastructure with excessive outbound permissions, with policies such as allowing access to all IP ranges belonging to a trusted cloud provider, can be abused by adversaries to blend in with authorized traffic to that cloud provider. A comprehensive review of all possible pivot points, persistence mechanisms, and outliers in configured access is required to ensure thorough containment and subsequent ejection.
You probably know that a Cloud Security Posture Management (CSPM) solution is an important defense prior to the occurrence of a breach, because a CSPM helps identify Indicators of Misconfiguration (IOMs) that contribute to the risk of a breach, and of course remediating those IOMs can actually prevent a breach. But very often, a CSPM is an important part of breach containment as well, both because it can help identify (in combination with endpoint security solutions) an active breach at the interface between endpoint and cloud, and because adversaries can continue to attempt to breach the cloud environment using these misconfigurations.
Falcon Cloud Security CSPM
Falcon Cloud Security delivers threat detection, prevention and remediation, while enforcing security posture and compliance across AWS, Azure, and Google Cloud. It empowers organization and security teams with unified visibility and security consistency to stop breaches faster and more efficiently.Explore Now
In other words, while the visibility and containment efforts are ongoing, your organization can leverage the urgency of the breach to remediate existing misconfigurations and any new ones that were introduced by the adversary. Sharing access to the CSPM console with your peers in cloud platform operations and empowering them to see the progress they make can considerably shorten the path to securing the cloud environment.
CrowdStrike’s Incident Response for Cloud service
Incident Response for a cloud data breach requires a different set of skills, experience and tools than traditional IR for an on-premises attack. Without dedicated cloud IR specialists with the right tooling for the different cloud service provider platforms (AWS, Azure/O365, GCP), you will experience long delays in getting critical answers which will slow down your response and recovery time. Organizations without the necessary skills to respond effectively, have trouble containing active threats and ejecting them from the network, often losing critical evidence and being unable to determine exactly what happened during the attack, and what data may have been compromised.
CrowdStrike’s dedicated team of cloud IR specialists get called in to perform investigations on cloud data breaches, many of which are the result of misconfigured cloud security settings. We recommend that organizations fortify their cloud security posture before a breach occurs rather than deal with the aftermath of a cloud data breach.
In the event that you are experiencing a cloud data breach, you need to engage an IR firm that has a specialized team of cloud incident responders and investigators. Firms that deliver traditional IR services on an endpoint breach will not be effective when it comes to responding to a cloud breach.
CrowdStrike has a dedicated team of cloud IR specialists with knowledge of the AWS, Azure / O365 and GCP cloud environments. We have built our own suite of Cloud Data Collector tools which we use to accelerate the forensic investigation and gain visibility to the malicious actions that have been executed by a threat actor.
CrowdStrike prides itself on being a leader in cloud incident response and brings control, stability, and organization to what can become a chaotic event. Learn how CrowdStrike can help you respond to a cloud data breach faster and more effectively:
Incident Response for Cloud
The CrowdStrike Services Cloud Incident Response team is composed of specialists drawn from industry, government and military dedicated to providing expert cloud incident investigation, containment and remediation services. These experts are capable of supporting comprehensive multi-cloud and on-premise investigations.Request Info