How to Configure Falcon Data Replicator

Introductions

Falcon Data Replicator is a service that provides raw CrowdStrike events in JSON format via Amazon Web Services Simple Storage Service (S3). You can ingest these events into your local data warehouse/data layer.  Having this data local allows for further investigation or exporting into tools such as Splunk.

How does Falcon Data Replicator work?

As part of Falcon Data Replicator, each customer receives an Amazon Web Services S3 bucket (one for each CID that you have access to) and an Amazon Web Services SQS queue. Falcon Data Replicator uses Amazon S3 to facilitate a data “handoff” between CrowdStrike and you.

The process is as follows:

  • The Data Replicator provides all CrowdStrike events that are visible via the Event Search page as well as DetectionSummaryEvents, UserActivityAuditEvents, and AuthActivityAuditEvents, which are documented in the Falcon Streaming API Reference.
  • Every few minutes, a new batch of compressed data is written to a file in your S3 bucket for you to ingest into your system. The amount of data sent per batch will vary based on the number of events sent by the sensor. With every new batch, one or more new files are written. The maximum size per file is 25MB.
  • After the data is written to the S3 bucket, it will be ready to be consumed. The data in your S3 bucket is organized into directories. In the root of your bucket there is a single directory called data. The “data” directory is where each full batch of files is written every few minutes. Every batch is assigned a unique UUID which is used as the subdirectory for that batch, and inside this directory will be a single _SUCCESS file indicating batch completion along with one or more gzipped files containing the actual data of that batch. Batches use the naming schema part-00000.gzpart-00001.gz, etc. An example s3 bucket would look  like the following:

S3 buckets

  • In the unlikely event that a batch fails during the write process, no _SUCCESS file will be written, and the data will be automatically cleaned up as soon as possible. Until that time, however, any batch that is missing a _SUCCESS file should be ignored.

Important: Your S3 bucket has a default policy on it which causes all data within it to be automatically deleted after 7 days. You must fetch the data within 7 days to avoid data loss.

  • Finally, in addition to an S3 bucket, each customer is also given an Amazon SQS queue which receives a new message from CrowdStrike every time a batch of files is successfully written to their S3 bucket. The reason for including an SQS queue in addition to an S3 bucket is to address the fact that S3 is eventually consistent. Adding an SQS queue not only addresses the eventual consistency issues, but also allows a consumer of Falcon Data Replicator to use a much more efficient event-driven model. Each SQS message corresponds to a single batch of files in your S3 bucket, and contains a JSON body that contains the information needed to ingest that batch of files. An example message looks like the following:
{
  "cid": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
  "timestamp": 1492726639137,
  "fileCount": 4,
  "totalSize": 349986220,
  "bucket": "cs-prod-cannon-xxxxxxxxxxxxxxxx",
  "pathPrefix": "data/f0714ca5-3689-448d-b5cc-582a6f7a56b1",
  "files": [
    {
      "path": "data/f0714ca5-3689-448d-b5cc-582a6f7a56b1/part-00000.gz",
      "size": 90506436,
      "checksum": "69fe068dd7d115ebdc21ed4181b4cd79"
    },
    {
      "path": "data/f0714ca5-3689-448d-b5cc-582a6f7a56b1/part-00001.gz",
      "size": 86467595,
      "checksum": "7d0185c02e0d50f8b8584729be64318b"
    },
    {
      "path": "data/f0714ca5-3689-448d-b5cc-582a6f7a56b1/part-00002.gz",
      "size": 83893709,
      "checksum": "7c36641d7bb3e1bb4526ddc4c1655017"
    },
    {
      "path": "data/f0714ca5-3689-448d-b5cc-582a6f7a56b1/part-00003.gz",
      "size": 89118480,
      "checksum": "d0f566f37295e46f28c75f71ddce9422"
    }
  ]
}

These fields are defined as follows:

Field Type Description
cid String Your unique CrowdStrike ID.
fileCount int How many part-nnnnn.gz files are contained in this batch.
totalSize int Total combined size in bytes of all part-nnnnn.gz files in this batch.
bucket String The name of your Falcon Data Replicator S3 bucket.
pathPrefix String The S3 prefix where this batch of files is located.
files Array of Objects An array containing info on each part-nnnnn.gz file in this batch.
path String S3 path for given file.
size int Size of given file.

One of the nice features of SQS is that it provides a message lifecycle that allows for easy fault tolerance and horizontal scalability on its consumers. For more details, see the “Amazon SQS Message Lifecycle” section of the SQS details page.

Using Falcon Data Replicator

Between the SQS queue and S3 bucket, the flow for using Falcon Data Replicator becomes very simple:

  1. Consume from the SQS queue.
  2. Upon receiving a new message, ingest the files that the message references from S3 into your system. Because of the SQS queue, you will never need to directly poll the S3 bucket looking for new files, an operation that can be cost prohibitive.

An example consumer written in Python is available on the Falcon UI > Downloads page which can be used directly with modifications, or treated as a reference for writing your own consumer.

I used the two sample scripts and modified with the pertinent information.  The “data replicator config” script specifies the AWS Key, AWS Secret, Queue URL, and the output path.  On average, every Windows and Mac sensor generates roughly 2.5MB of compressed data per day. Thus, if you had 10,000 sensors in production, you would receive roughly 25GB of compressed data via Falcon Data Replicator.  If you have a large mix of Linux sensors, expect the volume to be 3-4x higher.  Consider these guidelines when choosing an output path.

Data Replicator Config Script

The “data replicator sample consumer” script provides defines where and what should be downloaded and uses the information in the previous script.

data replicator consumer script

 

Important: If you have multiple internal teams who want to build consumers for Falcon Data Replicator, you should download the data locally and share it from your local storage. We do not support multiple consumers talking to the same SQS queue.

Running the Script

The scripts are written in Python so the server running Falcon Data Replicator needs to have Python and the PIP module installed.  If the module is not installed you’ll receive this error:

intall boto3

I Failed to initially install the pip module so running the command “pip install boto3” failed.

Python install PIP module

After correctly installing the pip module the script ran and data started filling the output source identified in the “data replicator config” script.

Obtaining Credentials

To use Falcon Data Replicator, CrowdStrike Support will first need to provide you:

  • AWS Credentials: To obtain your AWS Credentials, you will need to generate a GPG key pair in ASCII format. Send the public part of the GPG key to support@crowdstrike.com. We will encrypt the API key with your public key and send you the encrypted API key, which only you can decrypt using your private GPG key.
  • SQS URL: CrowdStrike Support will provide you an SQS URL.

Conclusion

Falcon Data Replicator allows customers to store their data for extended periods of time, feed the data to 3rd party forensics tools and use it for additional analysis.  For additional help please see the support portal or reach out to a CrowdStrike representative at CrowdStrike.com

More resources

 

Stop Breaches with CrowdStrike Falcon request a live demo