What is Centralized Logging?

Arfan Sharif - September 20, 2022

Centralized logging is the process of collecting logs from networks, infrastructure, and applications into a single location for storage and analysis. This can provide administrators with a consolidated view of all activity across the network, making it easier to identify and troubleshoot issues.

In this article, we’ll explore the value of centralized logging architecture, how it works, and how you can create a centralized logging workflow.

Why Are Logs Necessary?

Logs provide an audit trail of system activities, events, or changes in an IT system. They can help troubleshoot system functionality issues, performance problems, or security incidents. System logs are used to determine when changes were made to the system and who made them. Additionally, logs are often necessary for regulatory requirements.

Traditional Single-Service Logging Systems

Traditional logging systems were exclusively focused on the individual machines they were installed on. This was considered sufficient back when single, stand-alone servers delivered whole services. But in an age of multi-tiered microservices and network interconnectivity, if you’re only looking at a single data source, then you’re not getting the full picture.

For example, if your database is running slow, then analyzing database slow query logs may not be the only answer. Most systems are tiered with front-end, middle-ware and then databases.  You may have to look at the logs generated from the underlying server and storage subsystem as well as name resolution performance and the network.

The Need for Log Collection Across Distributed Components

That’s why the only way to understand what’s really going on in a distributed system is to collect all the logged events across the network in real time. This includes logs from:

  • Server infrastructure
  • Storage
  • Database
  • API gateways
  • Load balancers
  • Firewalls
  • … and others.

When troubleshooting a problem, you often need to correlate events from multiple log sources, which is why you must capture logs from every component that contributes to making your applications work.

Manually logging in each day to access multiple infrastructure components—perhaps dozens of them—just to read hundreds (or thousands) of lines of logs is burdensome to the point of impossible. This approach is a time sink, and it almost guarantees that you will miss important events at one point or another. However, you still can’t afford to ignore those logs.

The only practical solution, therefore, is to have a single pane of glass—a solution that can automatically connect to all your systems and collect their logs in real time, and then present them in an attractive, easy-to-understand interface. Let’s consider some of the advantages of centralized logging.

Benefits of Centralized Logging

Handling Multiple Log Formats

Multi-tier systems can generate logs in different formats. For example, Linux systems use rsyslog or journald, while Windows has Event Logs. Meanwhile, other systems like databases, firewalls, and SAN systems might use their own proprietary formats.

Efficient Central Storage

However, it’s not uncommon to see systems accumulate terabytes of logs. This can quickly overrun storage devices, threaten system integrity, and add significant performance overhead on applications.

A modern log management solution is capable of filtering any logs ingested by using compression algorithms to define the efficiency of storage and retention capacities. All ingested logs are stored in a central location, allowing your servers to rotate out their copies of logs to conserve local storage space. Centralized logging systems usually store logs in a proprietary, compressed format while also letting you configure how long you want to keep the logs.

Faster Search Querying Querying Across All Logs

A centralized logging system also gives you the ability to search through thousands of lines of events for specific information, or extract information and summarize them quickly and efficiently. With ingest latencies minimized, customers can run queries and expect search results in sub-seconds.

Event Correlation

Similarly, modern centralized logging systems can enhance and enrich logged events with extra information. They use artificial intelligence to correlate seemingly disparate pieces of information. Event correlation is the ability to connect two or more related events to identify issues and track down root causes. Such platforms can show trends and anomalies in dashboards and charts, including settings to alert you when significant trends, anomalies, and event correlations are identified.

The reports and dashboards from a centralized logging system can also serve secondary purposes like budgeting, capacity planning, and performance benchmarking.

Security Analysis

Sophisticated centralized log management is also pivotal for effective cybersecurity controls through security information and event management (SIEM) and security orchestration, automation, and response (SOAR). Additional benefits are reduction in costs associated with SIEM and SOAR solutions. Modern log management can operate in tandem and provide customers with longer retention and storage at a fraction of the cost.

Customer Story: Netlify and CrowdStrike Falcon® LogScale

As a cloud computing company supporting some of the world’s biggest tech companies, Netlify was drowning in logs. Their previous log management solution — a patchwork of open-source tools — simply wasn’t delivering the scale and performance the business needed to return queries in a timely manner.

Learn why Netlify chose Falcon LogScale and what they’ve been able to accomplish with the technology.

Download Now

How Does Centralized Logging Work?

Both distributed and centralized logging process are composed of four steps:

  1. Collection
  2. Processing
  3. Indexing
  4. Visualization

Let’s cover each of these steps in more detail.

Collection

The first step for log analysis is to collect all the logs, centralizing them in a secure location to be accessed and analyzed when needed.

This step requires integrating the source systems with the logging application. Integration can be achieved by an agent running on the source server. The agent reads the server logs and sends them to the centralized logging platform on a specific port.

Integration can also be achieved through native methods. For example, a server’s Syslog daemon might be set up to send log data directly to the logging server. For cloud-hosted services, integration may involve connecting to the cloud service’s logging environment (such as AWS CloudWatch or Azure Monitor) and reading the events.

Processing

This next step—processing—transforms all the collected raw log data into a more usable format. Transformation can include parsing the logged events and extracting specific fields of interest like date/time, source IP, user name, application name, and event message.

Processing may also include steps for enriching the logging information. For example, a log with IP addresses can be enriched by looking up their geolocations and adding that information to the event stream. Similarly, timestamps of disparate logs may be converted to a common time zone.

Another processing step may filter out unnecessary records while compressing the rest.

Indexing

The indexing step is an internal process in which the logging platform indexes all the logged events after they have been processed. This is much like creating a full-text or database index. This index can then be used to optimize the speed of searching through logs.

Once logged events are indexed, they are ready for searching and filtering.

Visualization

In this step, the centralized logging system updates and refreshes all the built-in and custom charts and dashboards, thereby providing a fresh picture of the system.

Although the four steps described here are distinct, they may occur simultaneously and in real time. This means while you are looking at a trend graph created from events ingested a minute ago, the system could be collecting, processing, and indexing new data as it streams in, updating your visualization charts dynamically.

Learn More

Read part 3 of our series on Kubernetes logging guides to learn how centralized logging works in a Kubernetes cluster. Read: Centralizing Kubernetes Logs

Best Practices for Centralized Logging

Although this is a topic in itself, the following are some best practices to incorporate when adopting a centralized logging system for your enterprise.

Involve Your Teams

First, involve the teams and people who will be using the platform. This can be your developers, DevOps team, SecOps team, and so on.

Determine What to Collect

You need to plan what logs are most important for your monitoring purposes and what events you want to capture from them. For example, you may be interested in only connection refusal events from your firewall logs. In addition, consider the following details:

  • How long do you want to retain your logs?
  • What common time zone will you use for normalizing timestamps?
  • What common fields of information do you want to have available?

Clarify Output Needs

Next, decide what type of output you want from the logs. Is it real-time anomalies or historic trends? Or do you want to use your logs for security event alerts only?

Ensure Integrations Exist for All Your Source Systems

There are many log management solutions out there. Some could be best for on-premises hosting, and others are SaaS-based. Each has its benefits and drawbacks. While platform is a consideration, the solution you choose must also have integrations (natively or through community plugins) for all the source systems you want to monitor.

Keep in Mind Compliance Requirements

For SaaS solutions, check for any regulatory restrictions. Some regulations prohibit logs from being stored in overseas data centers. Similarly, some industry standards require you to redact or encrypt sensitive information in logs.

Discover the world’s leading AI-native platform for next-gen SIEM and log management

Elevate your cybersecurity with the CrowdStrike Falcon® platform, the premier AI-native platform for SIEM and log management. Experience security logging at a petabyte scale, choosing between cloud-native or self-hosted deployment options. Log your data with a powerful, index-free architecture, without bottlenecks, allowing threat hunting with over 1 PB of data ingestion per day. Ensure real-time search capabilities to outpace adversaries, achieving sub-second latency for complex queries. Benefit from 360-degree visibility, consolidating data to break down silos and enabling security, IT, and DevOps teams to hunt threats, monitor performance, and ensure compliance seamlessly across 3 billion events in less than 1 second.

GET TO KNOW THE AUTHOR

Arfan Sharif is a product marketing lead for the Observability portfolio at CrowdStrike. He has over 15 years experience driving Log Management, ITOps, Observability, Security and CX solutions for companies such as Splunk, Genesys and Quest Software. Arfan graduated in Computer Science at Bucks and Chilterns University and has a career spanning across Product Marketing and Sales Engineering.