Integration Exploration: Getting Started with Falcon LogScale and Bucket Storage on AWS S3

If you run CrowdStrike Falcon® LogScale, previously known as Humio, locally or on-premises, one of your first steps is to configure local storage so that LogScale has a persistent data store where it can send logs. If you’re running LogScale as a cluster setup, then you’ll have some data replication as a function of how LogScale manages the data. However, even with that replication, you’ll probably still want something outside of your local infrastructure for resiliency.
That’s where bucket storage on AWS S3 comes in.
LogScale can use AWS S3 as a backing store for ingested logs. It uses LogScale’s native file format to make an encrypted copy of the logs in an S3 bucket where it can still read and search — even if the local copies no longer exist!
We should note that using bucket storage instead of persistent disk storage for LogScale is not our typically recommended approach for production environments. The main use case for bucket storage would be working in a cloud environment with constraints on network traffic and bandwidth, such that you can’t afford to write to persistent disks. If persistent disks for your LogScale clusters are not an option, then bucket storage is an excellent alternative. You can read the documentation for more information on working in the cloud with bucket storage for LogScale but persistent disks for Kafka (which is a Kafka requirement).
In this guide, we’ll walk through how to set up bucket storage on AWS S3 so that you can understand the value of this approach and learn how to implement it.
Bucket storage on AWS S3 is only available if you’ve deployed LogScale yourself. Also, keep in mind that you will need a license to deploy LogScale on-premises. Licenses typically take two business days to activate. You can register for a free trial here.
Are you ready? Let’s jump in.
Deploying LogScale
If you want to deploy LogScale yourself, you have three primary options:
- The Single Node Setup is great if you’re looking to kick the tires on LogScale and see what it has to offer. With this approach, you have the flexibility of accessing LogScale on your laptop, a VM or a cloud instance without much effort.
- The Cluster Setup is a more advanced approach but still attainable if you’re comfortable installing packages or running containers. If you’re running in a VM-centric environment or a server rack, this is probably the method you’ll use, especially if you’re looking for a production-level deployment.
- The Container Deployment is exactly what it sounds like! LogScale gives you the option to install on Docker or Kubernetes. This is a good option for teams that have already adopted containers and/or orchestration platforms and understand the nuances that come along with that sort of architecture.
For our demo, we’ll use a local deployment of LogScale running on a local Kubernetes cluster. For an excellent step-by-step setup of the Kubernetes and LogScale setup, follow this guide.
Verifying our LogScale deployment
With our Kubernetes cluster and LogScale running, it’s time for a sanity check. We’ll go through a few commands to validate the pieces we need are running.
Check the core LogScale cluster pods
$ kubectl get pods -n logscale NAME READY STATUS RESTARTS AGE logscale-cluster-core-dafuvj 2/2 Running 0 13m
Check the Kafka, Zookeeper and Entity Operator pods
$ kubectl get pods -n kafka NAME READY STATUS RESTARTS AGE logscale-cluster-entity-operator-57fc6485c9-vh5vv 3/3 Running 0 70m logscale-cluster-kafka-0 1/1 Running 0 70m logscale-cluster-zookeeper-0 1/1 Running 0 71m strimzi-cluster-operator-86864b86d5-7zz4s 1/1 Running 0 72m
If all of these are in a running state, you’re in good shape!
Check the LogScale interface
One more thing we’ll want to check is access to the LogScale interface. The easiest way to do this is to run a port forward.
$ kubectl -n logscale port-forward svc/logscale-cluster 8080 Forwarding from 127.0.0.1:8080 -> 8080 Forwarding from [::1]:8080 -> 8080
Then, we navigate to https://www.crowdstrike.com:8080
in our browser and log in (username: developer
and password: password
). The username is standard, but the single-user password is defined as part of the environment variables for the humiocluster
. You can change this if desired:
$ kubectl get humiocluster logscale-cluster -n logscale \ -o jsonpath="{.spec.environmentVariables[4]}" | jq . { "name": "SINGLE_USER_PASSWORD", "value": "password" }
Now that we’re confident our LogScale instance is up and running, let’s take a look at storage.
Currently, we have local storage collecting and retaining all of our logs. However, we want to configure AWS S3 Bucket Storage to retain a copy of our logs. This is helpful in case our cluster suddenly crashes or we don’t have enough nodes with replicated data.
Note: Don’t confuse bucket storage with S3 Archiving, which also makes a copy of the logs but not in a LogScale native format, which means they’ll no longer be searchable.
Configuring for AWS S3 Bucket Storage
Now, we’re ready to configure LogScale for bucket storage on AWS S3. Our first step is to ensure we have proper permissions at AWS.
Set up AWS permissions
Since we’re running locally, we’ll need to create an AWS IAM user with proper permissions to access our AWS S3 Bucket. We won’t cover every detail for creating the IAM user, but here are the basic steps:
- Log into your AWS account and navigate to IAM.
- Create a policy with S3 bucket read/write access.
- Create an IAM group and attach that policy.
- Create an IAM user and make that user a part of the group. (Alternatively, you can inline the policy for a single user, bypassing the use of an IAM group entirely.)
We’ve also created an S3 bucket (arbitrarily) named logscale-beachbucket
that we can configure for use with LogScale. A sample policy for bucket access can be found in the LogScale docs. Our policy looks like this:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::logscale-beachbucket" ] }, { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::logscale-beachbucket/*" ] } ] }
After creating the IAM user, we obtain an Access Key ID and a Secret Access Key. We’ll use these momentarily to authenticate our LogScale cluster for access to our S3 bucket.
Update humiocluster for AWS authentication
With the baseline set up, we’ll configure our LogScale cluster! First, we need to tell LogScale how to authenticate to our AWS S3 bucket. We do this with the following environment variables, which correspond to the Access Key ID and Secret Access Key we copied down when creating the IAM user:
S3_STORAGE_ACCESSKEY
S3_STORAGE_SECRETKEY
We also need to tell LogScale where to put the data and how to encrypt it. We do that with the following environment variables:
S3_STORAGE_BUCKET
S3_STORAGE_REGION
S3_STORAGE_ENCRYPTION_KEY
In a virtual machine installation of LogScale or a similar local install, we would use INI files to configure these value. Since we’re running on Kubernetes, we need to make changes to the humiocluster
custom resource. The easiest way to do this is by editing the custom resource.
First, we run the following command:
$ kubectl edit humiocluster logscale-cluster -n logscale
Then, we update the environment variables so that our custom resource definition looks like this – notice the values associated with S3):
apiVersion: core.humio.com/v1alpha1 kind: HumioCluster metadata: name: logscale-cluster namespace: logscale # <snip> spec: # <snip> environmentVariables: - name: HUMIO_MEMORY_OPTS value: -Xss2m -Xms1g -Xmx2g -XX:MaxDirectMemorySize=1g - name: ZOOKEEPER_URL value: logscale-cluster-zookeeper-client.kafka.svc.cluster.local:2181 - name: KAFKA_SERVERS value: logscale-cluster-kafka-brokers.kafka.svc.cluster.local:9092 - name: AUTHENTICATION_METHOD value: single-user - name: SINGLE_USER_PASSWORD value: password - name: S3_STORAGE_ACCESSKEY value: <redacted> - name: S3_STORAGE_SECRETKEY value: <redacted> - name: S3_STORAGE_BUCKET value: logscale-beachbucket - name: S3_STORAGE_REGION value: us-east-1 - name: S3_STORAGE_ENCRYPTION_KEY value: <redacted>
Once you save your edits, new pods will get rolled out. Soon, you’ll begin to see objects added to the bucket. It will look similar to the following:
As the data ingest continues to happen, you’ll see LogScale create more objects in the bucket, and AWS will begin to record and show the amount of storage used.
We’re up and running! LogScale has been successfully configured for bucket storage with AWS S3.
Configuring for your specific environment
Perhaps you don’t want to use S3 buckets called logscale-beachbucket
or mytestbucket
in production. LogScale allows you to configure a fresh bucket. The documentation states the following:
Existing files already written to any previous bucket will not get written to the new bucket. LogScale will continue to delete files from the old buckets that match the file names that LogScale would put there.
Here are the steps you would take to configure LogScale to use a new S3 bucket:
- Create a new bucket in S3.
- Update the IAM policy to allow read/write to the new S3 bucket.
- Edit the
humiocluster
custom resource, updatingS3_STORAGE_BUCKET
andS3_STORAGE_REGION
inspec.environmentVariables
to point to the bucket you created.
Once you save the edits to the humiocluster
custom resource, new pods will be rolled out, and you’ll begin to see data collected in the new bucket!
Conclusion
We’ve covered a lot in this guide, so let’s quickly recap what we’ve done:
- We set up our base installation of LogScale.
- We verified that everything was running by checking pod statuses and validating the LogScale interface in a browser.
- We went through AWS IAM steps so that we could obtain an Access Key ID and a Secret Access Key for read/write access to our newly created S3 bucket.
- We put it all together and configured LogScale to use our S3 bucket for storage.
- We verified that LogScale was creating new objects in our S3 bucket.
- We walked through how you could switch to a new bucket when you’re ready to move out of sandbox mode.
Congratulations on setting up AWS S3 bucket storage for LogScale! Nicely done! Some common next steps after getting set up with AWS S3 Bucket Storage include:
- Looking into performance tuning options
- Setting up S3 archiving
- Improving operational resilience with a high availability configuration
Happy logging!