Back to Tech Center

Importing Logs from FluentD into Falcon LogScale

One of the primary debugging tools for a cloud-native developer is the simple, humble log file. Some malfunctioning systems can be impossible to debug without them.

FluentD is an open source log collector that provides unified log processing across multiple emitter systems, either on the same logical device or separate devices. For example, on a single k8s pod, logs are emitted from the pod itself, the container running in the pod and the microservice running in the container. FluentD can collect all of these logs, format them to a common schema, filter unnecessary data and ship this unified stream elsewhere for additional processing.

CrowdStrike Falcon® LogScale (formerly known as Humio) is a centralized log management and observability solution used to bring sanity out of the log and events data emanating from disparate systems. Consider all the logs that come from different k8s clusters, databases, microservices, message brokers and other infrastructure running in your clouds. LogScale can ingest logs from multiple FluentD (and other sources) and further process them for a wider view of your cloud infrastructure.

In this guide, we’ll walk through how to set up FluentD for collecting logs on your system. We’ll also demonstrate how to configure FluentD to ship those logs to LogScale for organization and analysis.

Getting started with FluentD

FluentD is used to collect logs of a particular system into a single stream of data that can be filtered and shipped upstream. The final destination for these aggregated logs can be multiple places:

  • A data lake
  • An archive database
  • Another instance of FluentD
  • Another log aggregator, such as LogScale

FluentD allows for a unified logging experience in all your microservices. It allows for manipulating the log schema to add new fields onto logs dynamically while filtering out unnecessary logs.

FluentD is also highly extensible, with a robust set of plugins available for a variety of use cases. Plugins include the k8s metadata filter plugin, which allows for automatic annotation of k8s pod and cluster information for your logs. This is a very common pattern for k8s log aggregation, as it helps organize and filter the firehose of data hitting your log management system.

How to set up FluentD

Before installing FluentD, your system must meet certain criteria.

The FluentD documentation site lists several requirements that must be in place for your environment:

  • Set up NTP
  • Increase the maximum number of file descriptors
  • Optimize the network kernel parameters

Please follow the instructions from the FluentD documentation before continuing. These are general specifications for Linux-based systems using the SystemD management tool. You may need to adjust these instructions to meet your environment.

After you’ve gone through the instructions above, you’re ready to install FluentD. Assume the following commands unless otherwise stated are run using superuser permissions, such as with sudo or logged into as root.

Installing FluentD

FluentD has packages for the major Linux distributions and also options for installing via a Ruby gem or directly from the source repo.

For OpenSuSE, there is a package available for installing the ruby gem named ruby3.1-rubygem-fluentd.

For Debian and Red Hat-based systems, the package installs a slightly different named tool td-agent. This is effectively the same thing as fluentd, but with a legacy name.

In this guide, td-agent may be used interchangeably with fluentd.

Setting up the FluentD configuration file

Some packages for FluentD will come with a default configuration file. However, some customization is usually necessary to capture the specific logs or files you want in your system.

The configuration is typically found in /etc/td-agent/td-agent.conf, but not all installation methods will supply this file. Therefore, it may be up to you to create a configuration file.

For a very simple configuration that will add a listener on HTTP port 8888, add this to the td-agent.conf file:

<source>
  @type http
  @id http_input

  port 8888
</source>

Running FluentD

If you’re using systemd to manage your running services, then it is likely that the installation process has already initialized and started fluentd (or td-agent) on your system. You can check by running:

systemctl status td-agent.service

Adjusting permissions

On Debian-based machines, the permissions for configuration files and directories needed to run FluentD may require some additional setup after installation. Doublecheck that the configuration files and any fluentd and td-agent folders underneath /var all have their group set to the td-agent group, and that group has read and execute permissions on those files and directories.

With Debian 11.3, for example, /var/fluentd  and /var/logs/td-agent were the two folders identified as needing permission changes.

To set these permissions, you may need to use the chgrp or chmod utilities. Your commands may look something like this:

chgrp td-agent /var/fluentd
chmod 775 /var/logs/td-agent

Startup

If you’re not using systemd to manage your FluentD installation, then you can start it up manually via this command:

fluentd -c <path/to/config/file> &

Of course, replace <path/to/config/file> with the path to the td-agent.conf file you created above.

If there are no error messages or if the status returned from systemd is active, then you can assume FluentD is up and running on your systems. It’s now waiting to meet conditions in your configuration file before taking action.

To test the listener configured for port 8888, you can use the following command:

curl -X POST -d 'json={"json":"test!"}' https://www.crowdstrike.com:8888/debug.test

Next, check the FluentD logs to ensure the log was ingested:

tail -n 1 /var/log/td-agent/td-agent.log

You should see something like the following (with a current timestamp, of course):

2022-06-19 20:19:05.929908014 -0600 debug.test: {"json":"test!"}

FluentD is up and running!

Getting started with Falcon LogScale

Creating a LogScale account

The first step is to register an account with Falcon LogScale. There are tiers ranging from free trials to unlimited log ingestion. Choose the level that suits the needs of your organization. For this guide, you can get by with the Falcon LogScale Community Edition. Note: it may take up to two days to activate your new LogScale account.

Once you’ve created an account and logged in, navigate to the “Repository” list as shown in the image below:

Set up a LogScale repository

A “repository” in LogScale is the base layer for organizing data. You can add users, dashboards and logs to a repository. In the image above, we created a repository named “Tutorial” for this demo.

After you have created a new repository, you will need an ingest token to send logs from Logstash to your Falcon LogScale repository. The ingest token authenticates the log shipper, verifying that it is authorized to send logs to your Falcon LogScale repository.

When you navigate to your repository for the first time, you’ll be asked to create an ingest token. If you don’t create an ingest token at that time, you can always go to the Settings tab, then navigate to Ingest Tokens.

Integrating FluentD with Falcon LogScale

At this point in the guide, you should have:

  • A FluentD instance working on a local machine or container
  • A LogScale account
  • A LogScale repository and ingest token

The next step is to start shipping logs from FluentD to LogScale. To do this, we need to add the Splunk Enterprise plugin to FluentD. To install a FluentD plugin, run one of the following:

  • If you’re using a Debian-based system with td-agent, then run:

td-agent-gem install fluent-plugin-splunk-enterprise

  • If you are using FluentD directly (a source install or non-Debian package), then run:

fluent-gem install fluent-plugin-splunk-enterprise

These two commands are equivalent, and only one needs to run. The right one for you depends on your FluentD installation type.

After the plugin installation completes, configure FluentD to use it. In the following example FluentD configuration file (td-agent.conf), the plugin is set to tail the Apache access logs and ship them to LogScale:

## read apache logs with tag=apache.access
<source>
  @type tail
  path /var/log/apache2/access_log
  pos_file /home/my-username/Documents/fluent/data.pos
  tag apache.access
  path_key filename
  <parse>
    @type none
  </parse>
</source>

## Mutating event filter
## Add a hostname field to apache.access tagged events
<filter apache.access>
  @type record_transformer
  <record>
    hostname "#{Socket.gethostname}"
  </record>
</filter>
# Ship logs to LogScale that are tagged apache.access
<match apache.access>
  @type                 splunk_hec
  host                  cloud.community.humio.com
  port                  443
  token                 <your-ingest-token>
  use_ssl               true
  ca_file               /etc/ssl/ca-bundle.pem
  <buffer>
    @type               memory
    flush_mode          interval
    flush_interval      2
    flush_thread_count  2
    overflow_action     block
    retry_forever       true
  </buffer>
</match>

For your use, you’ll need to specify the location for pos_file, which is the file on which FluentD will record the position it is read to. You’ll also need to set the token to the ingest token for your LogScale repository.

Also, you may need to change the ca_file field to use a different certificate. If you don’t have any certificates on your machine or image, then you may need to install a package like ca-certificates or create one using OpenSSL. The certificates should reside in the /etc folder but might be in /etc/ssl, /etc/pki, or some other custom location depending on your installation. Lastly, the host field may be different depending on your LogScale customer tier.

Now, assuming that Apache is running on your machine, use your browser to navigate to a web page on that server or use curl https://www.crowdstrike.com/ to get the index. In your LogScale repository (for our demo, that’s the “Tutorial” repository), you’ll soon see how this action generated a log. It should look like this:

Here, we have a single log line highlighted in blue. Below it, we have a breakdown of the different fields that were present in this log entry as it was ingested by LogScale. We also see the hostname field that we added with the log transformer in our configuration. This field is not present in a standard Apache access log; FluentD added the field as it was processing the Apache logs.

Conclusion

In this how-to guide, we demonstrated how to configure FluentD to ingest logs from a single system and ship those logs to Falcon LogScale. 

In our example, FluentD tailed and transformed an Apache access log. We set up a LogScale repository and obtained an ingest token. Then, we provided the ingest token in our FluentD configuration, setting up FluentD to process our Apache logs and ship them to our LogScale repository.

This is where your journey of using FluentD begins. From here, you can use FluentD to shape and filter your system’s logs, and then ship them to LogScale for management and monitoring!

Related Content