Back to Tech Center

Integration Exploration: Getting Started with Falcon LogScale and Bucket Storage on AWS S3

If you run CrowdStrike Falcon® LogScale, previously known as Humio, locally or on-premises, one of your first steps is to configure local storage so that LogScale has a persistent data store where it can send logs. If you’re running LogScale as a cluster setup, then you’ll have some data replication as a function of how LogScale manages the data. However, even with that replication, you’ll probably still want something outside of your local infrastructure for resiliency.

That’s where bucket storage on AWS S3 comes in.

LogScale can use AWS S3 as a backing store for ingested logs. It uses LogScale’s native file format to make an encrypted copy of the logs in an S3 bucket where it can still read and search — even if the local copies no longer exist!

We should note that using bucket storage instead of persistent disk storage for LogScale is not our typically recommended approach for production environments. The main use case for bucket storage would be working in a cloud environment with constraints on network traffic and bandwidth, such that you can’t afford to write to persistent disks. If persistent disks for your LogScale clusters are not an option, then bucket storage is an excellent alternative. You can read the documentation for more information on working in the cloud with bucket storage for LogScale but persistent disks for Kafka (which is a Kafka requirement).

In this guide, we’ll walk through how to set up bucket storage on AWS S3 so that you can understand the value of this approach and learn how to implement it.

Bucket storage on AWS S3 is only available if you’ve deployed LogScale yourself. Also, keep in mind that you will need a license to deploy LogScale on-premises. Licenses typically take two business days to activate. You can register for a free trial here.

Are you ready? Let’s jump in.

Deploying LogScale

If you want to deploy LogScale yourself, you have three primary options:

  1. The Single Node Setup is great if you’re looking to kick the tires on LogScale and see what it has to offer. With this approach, you have the flexibility of accessing LogScale on your laptop, a VM or a cloud instance without much effort.
  2. The Cluster Setup is a more advanced approach but still attainable if you’re comfortable installing packages or running containers. If you’re running in a VM-centric environment or a server rack, this is probably the method you’ll use, especially if you’re looking for a production-level deployment.
  3. The Container Deployment is exactly what it sounds like! LogScale gives you the option to install on Docker or Kubernetes. This is a good option for teams that have already adopted containers and/or orchestration platforms and understand the nuances that come along with that sort of architecture.

For our demo, we’ll use a local deployment of LogScale running on a local Kubernetes cluster. For an excellent step-by-step setup of the Kubernetes and LogScale setup, follow this guide.

Verifying our LogScale deployment

With our Kubernetes cluster and LogScale running, it’s time for a sanity check. We’ll go through a few commands to validate the pieces we need are running.

Check the core LogScale cluster pods

$ kubectl get pods -n logscale
NAME                           READY   STATUS    RESTARTS   AGE
logscale-cluster-core-dafuvj   2/2     Running   0          13m

Check the Kafka, Zookeeper and Entity Operator pods

$ kubectl get pods -n kafka
NAME                                              READY STATUS   RESTARTS  AGE
logscale-cluster-entity-operator-57fc6485c9-vh5vv 3/3   Running  0         70m
logscale-cluster-kafka-0                          1/1   Running  0         70m
logscale-cluster-zookeeper-0                      1/1   Running  0         71m
strimzi-cluster-operator-86864b86d5-7zz4s         1/1   Running  0         72m

If all of these are in a running state, you’re in good shape!

Check the LogScale interface

One more thing we’ll want to check is access to the LogScale interface. The easiest way to do this is to run a port forward.

$ kubectl -n logscale port-forward svc/logscale-cluster 8080
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080

Then, we navigate to  https://www.crowdstrike.com:8080 in our browser and log in (username: developer and password: password ). The username is standard, but the single-user password is defined as part of the environment variables for the humiocluster. You can change this if desired:

$ kubectl get humiocluster logscale-cluster -n logscale \
-o jsonpath="{.spec.environmentVariables[4]}" | jq .
{
  "name": "SINGLE_USER_PASSWORD",
  "value": "password"
}

Now that we’re confident our LogScale instance is up and running, let’s take a look at storage.

Currently, we have local storage collecting and retaining all of our logs. However, we want to configure AWS S3 Bucket Storage to retain a copy of our logs. This is helpful in case our cluster suddenly crashes or we don’t have enough nodes with replicated data.

Note: Don’t confuse bucket storage with S3 Archiving, which also makes a copy of the logs but not in a LogScale native format, which means they’ll no longer be searchable.

Configuring for AWS S3 Bucket Storage

Now, we’re ready to configure LogScale for bucket storage on AWS S3. Our first step is to ensure we have proper permissions at AWS.

Set up AWS permissions

Since we’re running locally, we’ll need to create an AWS IAM user with proper permissions to access our AWS S3 Bucket. We won’t cover every detail for creating the IAM user, but here are the basic steps:

  1. Log into your AWS account and navigate to IAM.
  2. Create a policy with S3 bucket read/write access.
  3. Create an IAM group and attach that policy.
  4. Create an IAM user and make that user a part of the group. (Alternatively, you can inline the policy for a single user, bypassing the use of an IAM group entirely.)

We’ve also created an S3 bucket (arbitrarily) named logscale-beachbucket that we can configure for use with LogScale. A sample policy for bucket access can be found in the LogScale docs. Our policy looks like this:

{
 "Version": "2012-10-17",
 "Statement": [
     {
         "Effect": "Allow",
         "Action": [
             "s3:ListBucket"
         ],
         "Resource": [
             "arn:aws:s3:::logscale-beachbucket"
         ]
     },
     {
         "Effect": "Allow",
         "Action": [
             "s3:PutObject",
             "s3:GetObject",
     "s3:DeleteObject"
         ],
         "Resource": [
             "arn:aws:s3:::logscale-beachbucket/*"
         ]
     }
 ]
}

After creating the IAM user, we obtain an Access Key ID and a Secret Access Key. We’ll use these momentarily to authenticate our LogScale cluster for access to our S3 bucket.

Update humiocluster for AWS authentication

With the baseline set up, we’ll configure our LogScale cluster! First, we need to tell LogScale how to authenticate to our AWS S3 bucket. We do this with the following environment variables, which correspond to the Access Key ID and Secret Access Key we copied down when creating the IAM user:

  • S3_STORAGE_ACCESSKEY
  • S3_STORAGE_SECRETKEY

We also need to tell LogScale where to put the data and how to encrypt it. We do that with the following environment variables:

  • S3_STORAGE_BUCKET
  • S3_STORAGE_REGION
  • S3_STORAGE_ENCRYPTION_KEY

In a virtual machine installation of LogScale or a similar local install, we would use INI files to configure these value. Since we’re running on Kubernetes, we need to make changes to the humiocluster custom resource. The easiest way to do this is by editing the custom resource.

First, we run the following command:

$ kubectl edit humiocluster logscale-cluster -n logscale

Then, we update the environment variables so that our custom resource definition looks like this – notice the values associated with S3):

apiVersion: core.humio.com/v1alpha1
kind: HumioCluster
metadata:
  name: logscale-cluster
  namespace: logscale
  # <snip>
spec:
  # <snip>
  environmentVariables:
  - name: HUMIO_MEMORY_OPTS
    value: -Xss2m -Xms1g -Xmx2g -XX:MaxDirectMemorySize=1g
  - name: ZOOKEEPER_URL
    value: logscale-cluster-zookeeper-client.kafka.svc.cluster.local:2181
  - name: KAFKA_SERVERS
    value: logscale-cluster-kafka-brokers.kafka.svc.cluster.local:9092
  - name: AUTHENTICATION_METHOD
    value: single-user
  - name: SINGLE_USER_PASSWORD
    value: password
  - name: S3_STORAGE_ACCESSKEY
    value: <redacted>
  - name: S3_STORAGE_SECRETKEY
    value: <redacted>
  - name: S3_STORAGE_BUCKET
    value: logscale-beachbucket
  - name: S3_STORAGE_REGION
    value: us-east-1
  - name: S3_STORAGE_ENCRYPTION_KEY
    value: <redacted>

Once you save your edits, new pods will get rolled out. Soon, you’ll begin to see objects added to the bucket. It will look similar to the following:

As the data ingest continues to happen, you’ll see LogScale create more objects in the bucket, and AWS will begin to record and show the amount of storage used.

We’re up and running! LogScale has been successfully configured for bucket storage with AWS S3.

Configuring for your specific environment

Perhaps you don’t want to use S3 buckets called logscale-beachbucket or mytestbucket in production. LogScale allows you to configure a fresh bucket. The documentation states the following:

Existing files already written to any previous bucket will not get written to the new bucket. LogScale will continue to delete files from the old buckets that match the file names that LogScale would put there.

Here are the steps you would take to configure LogScale to use a new S3 bucket:

  1. Create a new bucket in S3.
  2. Update the IAM policy to allow read/write to the new S3 bucket.
  3. Edit the humiocluster custom resource, updating S3_STORAGE_BUCKETand S3_STORAGE_REGION in spec.environmentVariables to point to the bucket you created.

Once you save the edits to the humiocluster custom resource, new pods will be rolled out, and you’ll begin to see data collected in the new bucket!

Conclusion

We’ve covered a lot in this guide, so let’s quickly recap what we’ve done:

  1. We set up our base installation of LogScale.
  2. We verified that everything was running by checking pod statuses and validating the LogScale interface in a browser.
  3. We went through AWS IAM steps so that we could obtain an Access Key ID and a Secret Access Key for read/write access to our newly created S3 bucket.
  4. We put it all together and configured LogScale to use our S3 bucket for storage.
  5. We verified that LogScale was creating new objects in our S3 bucket.
  6. We walked through how you could switch to a new bucket when you’re ready to move out of sandbox mode.

Congratulations on setting up AWS S3 bucket storage for LogScale! Nicely done! Some common next steps after getting set up with AWS S3 Bucket Storage include:

Happy logging!

Related Content