Structured, Unstructured and Semi-structured Logging Explained
This blog was originally published July 20, 2021 on humio.com. Humio is a CrowdStrike Company.
Structured, semi structured and unstructured logging falls on a large spectrum each with its own set of benefits and challenges. Unstructured and semi structured logs are easy to read by humans but can be tough for machines to extract while structured logs are easy to parse in your log management system but difficult to use without a log management tool.
What Is Structured Logging?
Structured logging formats log data so it can be easily searched, filtered, and processed to enable more advanced analytics. The standard format for structured logging is JSON, although other formats can be used instead. Best practice is to use a logging framework to implement structured logging and that can integrate with a log management solution that accepts custom fields.
Differences Between Structured, Unstructured and Semi-structured Logs
Unstructured logs are massive text files made up of strings, which are ordered sequences of characters that are meant to be read by humans. In logs, these strings contain variables, which are placeholders for qualities that are defined elsewhere. Sometimes the variable is a wildcard, which is a placeholder that represents an unknown quality, just like in poker.
People can understand variables easily, but that’s not always true for machines. They can’t always tell the difference between a variable in one string and a similar sequence of characters elsewhere in the log file. When that happens, the results can be confusing, leading to slowed productivity, increased fallibility, and wasted man-hours and processing cycles.
Structured logs consist of objects instead of strings. An object can include variables, data structure, methods, and functions. For instance, an object that’s part of a log message might include information about an app or a platform. The organization can define the criteria they wish to include in the object in order to make the logs most useful in meeting their unique needs. This is the “structure” in a structured log.
Here is an example of a structured log:
Because structured logs are meant to be read by machines, the machines that read them can perform searches on them faster, produce cleaner output, and deliver consistency across platforms. Humans can still read structured logs, but they are not the primary audience. They are the audience for the output once a machine has finished operating on the data.
Semi-structured logs support both machines and humans, the logs consist of strings and objects. These logs usually need to be parsed into tables before they can be analyzed properly. These semi-structured logs haven’t found a standardization yet, thus making it harder for several programs and systems to identify and categorize them. For example, the quoting rules for the value of a white space, is not universally defined. Humio has taken steps in the right direction and can adapt to semi-structured logs in your environment.
Logging 101 Workshop > Watch now to learn how log data can be used to understand the health of the IT environment, keep it more secure, enhance business intelligence, and strengthen relationships with customers.
Why Use Structured Logging?
Finding an event in an unstructured log can be difficult, with a simple query returning far more information than desired and not the information actually wanted. For example, a developer seeking a log event created when a specific application exceeded disk quota by a certain amount may find all disk quota events created by all apps. In an enterprise environment, that’s going to be a big file.
To find the right event, the developer would have to write a complicated regular expression to define the search. And the more specific the event, the more complicated the expression. This approach is computationally expensive at scale because the conditions defined in the match expression have to be compared to every row value in the log record. If wildcards are used, the computational expense is even higher. And if the log data changes, the match expression won’t work as intended.
In some organizations, the developers write code in the form of strings, while Ops teams write code that parses those strings into structured data. This takes more time and increases the computational expense. If a developer or an Ops team member makes an error, the logging process breaks and more time is lost finding the source of the error.
Structured logging eliminates these problems by structuring the data as it’s generated. The organization can choose the format that works best for them, such as fixed column, key value pairs, JSON, etc. Most businesses today choose JSON format because it integrates well with automation systems, including alerting systems.
Text logs continue to have a place in enterprise because structured logging has a few drawbacks. Structured logs define data as it is created, so the data can only be used for purposes served by that definition. And if the structured data is stored on-premise or in any data warehouse with a rigid data schema, changes to that schema will require the structured data to be updated, which is a vast and costly endeavor. When deciding on a logging strategy, organizations should consider who will be using the data, what type of data is collected, where and how the data will be stored, and whether the data needs to be prepared before storing it or if it can be prepared when used.
Free Log Management Course > Follow along on this 6-part course to master tactics that allow for scalability while planning, designing, and integrating development and security practices into every aspect of your infrastructure.
Humio Supports Structured, Semi-structured and Unstructured Logs
The benefits of structured logging can only be realized with a flexible, scalable logging management system that supports development, compliance, and security needs.
Humio handles all unstructured, semi structured and structured messages, Humio, works with any data format, and is compatible with the leading open-source data shippers. Custom parsers make it easy to support any text format, so integrating Humio is simple and quick.
Most users send structured data to Humio as JSON objects. They don’t have to be formatted in any special way, they just have to be valid. Time stamps can be sent as part of the log entry, and Humio will use your time stamp instead of replacing it with its own. When sending unstructured data, time stamps are generated at the time of ingestion as a long comma delimited string and do not impact the ingestion time stamp.
Humio customers report better observability, flexibility, reliability, and cost effectiveness. Humio’s purpose-built logging tool, featuring innovative data storage and in-memory search/query engine technologies, delivers blazing-fast log management and index-free data ingestion that can’t be attained with traditional log management tools.