What is Log Parsing?

Arfan Sharif - December 21, 2022

Nearly every component in your IT environment—servers, applications, network devices, and others—generates a log. These log files contain information about the system’s internal or external events, from simple status information to critical errors. IT teams use these logs to debug code, troubleshoot failures and performance issues, investigate security breaches, and even analyze customer behavior.

Most companies use log management solutions to ingest, store, and analyze logs. Logs come from many different sources, and therefore, take on many different types and formats.

Log parsing is the process of converting log data into a common format to make them machine-readable. You might start with ingested logs spanning several different formats; but once they have been parsed, you can use your log management system to search and analyze their events as if they were a single unit.

In this article, we’ll look more deeply at log parsing, how it works, and which log parsing features are the most useful. We’ll also introduce CrowdStrike’s Falcon LogScale, a modern log management system.

What is Log Parsing?

A log management system must first parse the files to extract meaningful information from logs. Log parsing translates structured or unstructured log files so your log management system can read, index, and store their data. This way, you can easily filter, analyze, and manipulate the key-value information.

Some common log formats include:

  • JSON
  • CSV
  • Windows Event Log
  • Common Event Format (CEF)
  • NCSA Common log format
  • Extended Log Format (ELF)
  • W3C

One of the most commonly used structured data formats is JSON. It’s the first choice for many application developers because its Unicode encoding makes it universally accessible and readable by humans and machines alike. The snippet below shows a simple JSON document with key-value pairs as elements:

{      
 "userAccess":  {
    	"timestamp":  "2022-01-31 17:55.50”,
    	"client_ip": "10.2.31.21", 
    	"username": "bob",         
    	"status": "Error",         
    	"message": "File not found"     
 }
}

Given the wide community support for JSON, most log management solutions offer JSON parsing options by default.

How Does a Log Parser Work?

Usually, log parsers are built into the log management software’s engine. This means each log manager will have its proprietary method for parsing logs.

Most log management solutions have built-in parsers for common data types like Windows Event Logs, JSON, CSV, or W3C. Parsers are configured to recognize these log types based on the source data structure and file extensions.

Once a log file is ingested, the parser applies its built-in rules to extract useful field names and their values. Sometimes, a parser can store the extracted data in a hierarchical structure. With this approach, the user can search on any field and drill down through the returned result set to fine-tune the query.

For non-standard log types, users can provide custom log parsing rules. Typically this is done using regular expressions or the logging solution’s proprietary language. Some log management solutions make it even easier by letting users build the parsing rule from a graphical interface. Users can highlight the field names they are interested in; meanwhile, behind the scenes, the log management solution builds the parsing rule.

Once logs are parsed, the logging system will ingest the data so users can query, analyze, and visualize it.

Let’s consider the non-standard log entries in the snippet below:

2022-05-15T12:51:40+00:00 [WARN] "This is a warning" id=123 user=jbloggs
2022-05-15T12:52:42+00:00 [ERROR] "This is an error" id=124 user=unknown
...

To parse this log, we can use the following regular expression to extract the timestamp, log level, message, id, and user fields from the entries:

(?<timestamp>\d{4}\-\d{2}\-\d{2}T\d{2}\:\d{2})
\[(?<loglevel>.+)\]
\s\"(?<message>.+)\"\s
id\=(?<id>\d+)
user\=(?<user>\w+)

In CrowdStrike, you can use the CrowdStrike language to parse the same log file to extract the timestamp:

/^(?<ts>\S+)/ |

parseTimestamp("yyyy-MM-dd'T'HH:mm:ss[.SSS]XXX", field=ts)

What Log Parsing Features to Look For

Log parsing features should be one of the key areas to consider when you are assessing a log management solution. It’s important to get your logs ingested and parsed as quickly and easily as possible so you can spend more of your time analyzing them.

Automation

Make sure the tool you choose comes with automatic parsers for the most common log formats. For custom logs, you should be able to perform a full-text search on the raw data before writing custom parsing rules.

Customization

Logs come in so many formats that it would be impossible for any log management system to support them all by default. That’s why creating your parsing configuration must be easy and intuitive. Some logging tools provide a GUI to create custom parsing configurations. Others provide an editor writing custom parsing rules.

Visualization

An incorrect parsing configuration can show wrong data values and lead to inaccurate analysis. A good log management solution should allow you to perform a “dry-run” of the parsing rule by previewing the key-value pairs from a sample data set before saving it. It should also offer visual aids for creating and updating the parsing configuration. This can include color-coding and highlighting non-matching fields or errors.

Discover the world’s leading AI-native platform for next-gen SIEM and log management

Elevate your cybersecurity with the CrowdStrike Falcon® platform, the premier AI-native platform for SIEM and log management. Experience security logging at a petabyte scale, choosing between cloud-native or self-hosted deployment options. Log your data with a powerful, index-free architecture, without bottlenecks, allowing threat hunting with over 1 PB of data ingestion per day. Ensure real-time search capabilities to outpace adversaries, achieving sub-second latency for complex queries. Benefit from 360-degree visibility, consolidating data to break down silos and enabling security, IT, and DevOps teams to hunt threats, monitor performance, and ensure compliance seamlessly across 3 billion events in less than 1 second.

GET TO KNOW THE AUTHOR

Arfan Sharif is a product marketing lead for the Observability portfolio at CrowdStrike. He has over 15 years experience driving Log Management, ITOps, Observability, Security and CX solutions for companies such as Splunk, Genesys and Quest Software. Arfan graduated in Computer Science at Bucks and Chilterns University and has a career spanning across Product Marketing and Sales Engineering.