Back to Tech Center

Top LogScale Query Functions for New Customers

December 28, 2022

Tech Center

New to Falcon LogScale? Start with these commonly-used query functions

This guide will introduce you to Falcon LogScale most commonly-used functions through a series of scenarios. At the end of each scenario, you’ll be asked to consider how these functions could apply to your data — and get the chance to write a few queries yourself. Let’s get started!

  1. timeChart()
  2. window()
  3. avg()
  4. min()
  5. max()
  6. bucket()
  7. eval()
  8. select()
  9. groupBy()
  10. table()
  11. sort()
  12. case
  13. count()
  14. tail()
  15. selectLast()
  16. sankey()

Scenario 1: Working with timecharts

Imagine you’re a mobile application developer for a financial services company and you’ve been asked to prepare a weekly status update for your leadership team. You want to report on the minimum, maximum and average amount of data transferred through your mobile app at different times over the past week. Here’s how:

First, filter your field columns with the word “byte” to see fields relevant to network traffic:

Here you discover two fields you want to report on: orig_bytes and resp_bytes. Removing the filter, you also discover one additional field you want to report on: duration.

You can start with avg(“orig_bytes”) to get the average of the original bytes over the selected time period (in this case, the last 7 days) as follows: 

However, instead of seeing an average over the week, you wish to see how the average orig_bytes changed throughout the week. Try a timechart:

timechart (function=avg(orig_bytes))

Hovering over my chart, I can see the bucket size is one hour. This makes sense, since I have 169 buckets, and 24 hours/day x 7 days/week is 168 hours in my one-week chart.

What if I wanted to change the number of buckets and hence the time span of each bucket? We can add the span command to our timechart query:

timechart (function=avg(orig_bytes), span=12)

In talking with your manager, you decide the best span to show would be 10 minutes. But when computing the avg, min and max values, you want to compute across three 10-minute windows. You can do this with the window command inside our timechart query, as shown below. You’ll see we’ve combined our avg, min and max functions into a single timechart.

Another way of achieving the same result would be to remove the window function and set the span to 30m. We used the method below to introduce the window function. In practical situations, the window function is used to look across buckets along with something else. For example, you could take the std dev from the last three buckets and see if the current bucket is above or below that, thus detecting outliers. 

timechart( function=window( function=[avg(orig_bytes),min(orig_bytes),max(orig_bytes)], buckets=3), span=10m)

Here we queried the orig_bytes field, but we can also create similar timecharts for the duration and resp_bytes fields. We could save each of the timecharts as widgets on a dashboard to use for our weekly meeting. This simple query introduced us to numerous useful functions: timechart, window, bucket, avg, min and max. How could you use these functions when querying and displaying your own production data?

Next, let’s assume you want to see the values in kilobytes (kb) instead of bytes. We can create a new field by performing some math on an existing field using eval:

eval (sizeInKb=resp_bytes / 1000)

Activity: Take a moment to write a query using timechart, window, bucket, avg, min and max with some of your data. Try different values for bucket and span. How does this change the look of the timechart? Play with the formatting of your chart by selecting the paintbrush icon on the right to expand the format panel. From there, change the y-axis scale from linear to logarithmic. How does this change your chart?

Scenario 2: Building tables

Imagine you’re an e-commerce developer and you just received a Slack notification about a spike in customer complaints that started 30 minutes ago. All book inventories on your company’s e-commerce site are showing “Unavailable,” and your website’s online checkout is not working.

Our first thought is to inspect HTTP status codes, as these can indicate what type(s) of errors were seen, their frequency and when they began.

First we browse our fields and observe the method field. By clicking it, we see values of GET, PUT, POST, HEAD, OPTIONS, DELETE and PATCH. This indicates the HTTP operation that was requested. We make a mental note that we’ll be interested in GET operations, as those are when customers are trying to load a website.

Which field will show the errors? Continuing to browse the fields list, we see the statuscode field.  Clicking it shows a range of values: 201, 404, 500, 400 and more. These are the HTTP status codes we’re interested in. But we know that the 400 and 500 series of HTTP status codes represent client and server-side errors. So we begin with a simple query:

method = GET | statuscode >=400

From here, we might want to pull out just the statuscode and timestamp field for each event, instead of reviewing each log line. We can do that with a select query:

method = GET | select([“statuscode”, “@timestamp”]) | statuscode >=400

From here, what if we group the log lines into categories by statuscode. This will allow us to see which statuscode was most frequent. We use the groupby function. 

method = GET | statuscode >= 400 | groupby(statuscode)

To display a table of URLs which return an HTTP status code of 400 or higher, we can use the table command below. The first three lines (method, statuscode and books) serve as filters to ensure we’re looking at only the most relevant logs meeting all three criteria.

Method = GET

| statuscode >= 400

| books

| table([“url”,”@timestamp”,”method”])

Next, we can sort by timestamp as follows. You can experiment with other ways to achieve the same result. For instance, the table function has a sortby argument that defaults to timestamp  as well.

statuscode>=400 | method=GET | books

| table(["url","@timestamp","method"])

| sort("@timestamp")

This scenario taught us how to use select, sort, groupby and table.

Activity: Log into LogScale and create a table with some of your data. Which fields did you capture? Which field did you sort by? Why?

Scenario 3: Visualizing suspicious activity

You’re a security engineer investigating some suspicious files that have been sent to a few machines in your network. Imagine you receive an alert that 213.155.151.149 is the IP of a known threat actor, and you want a field that quickly alerts you when a system logs a specific IP. You could use a case statement below to create a new field and assign it a value of true when the IP is involved and false when the suspicious IP is not involved.

case    {tx_hosts[0] = "213.155.151.149" | knownthreat := true;

       * | knownthreat := false}

To check our work, we use the select and sort functions we covered earlier:

case    {tx_hosts[0] = "213.155.151.149" | knownthreat := true;

       * | knownthreat := false}

| select(["tx_hosts[0]","knownthreat"])

| sort(knownthreat)

You’ll notice that all tx_hosts[0] with an IP of 213.155.151.149 show the knownthreat value of true, whereas any other IP addresses, or even null IP addresses, show a value of false – success!

Let’s say we want to count how many times a known threat actor has been transmitting into our network. We could use the count command as follows:

case    {tx_hosts[0] = "213.155.151.149" | knownthreat := true;

       * | knownthreat := false}

| groupby(field=knownthreat, function=count())

Imagine you want to focus on the 10 most recent events where this suspicious IP address was involved. You could use the tail() function with a value of 10:

case    {tx_hosts[0] = "213.155.151.149" | knownthreat := true;

       * | knownthreat := false}

| knownthreat = true

| tail(10)

Which produces the following:

An alternative to tail() is selectLast(). Here we only want the last, or most recent, value of a given field. Let’s use this to see the most recent transmitting and receiving hosts for both our suspicious IP and a non-suspicious IP address:

case    {tx_hosts[0] = "213.155.151.149" | knownthreat := true;

       * | knownthreat := false}

| groupby(knownthreat, function=selectlast(["tx_hosts[0]","rx_hosts[0]"]))

To finish this scenario, let’s visualize the relationship between the machines sending and receiving this suspicious traffic. To do this, we’ll use a sankey diagram, where all you need to do is supply the source and target, per the example below:

#path = "files" and mime_type = "*flash"

| sankey(source=rx_hosts[0],target=tx_hosts[0])

That’s it! In this scenario we covered case statements, sankey diagrams, counting the number of events and filtering for only the last “x” number of events with tail() or the most recent values for given fields using selectLast.

Now it’s time for you to go hunting for suspicious activity within your network traffic using the strategies we employed above. If you’re not a LogScale user, all logs used in these scenarios come from either our in-product tutorial or Falcon LogScale Community Edition. Check these out if you want to practice with our datasets.

We hope you enjoyed this guide and we welcome your feedback at humiosuccess@crowdstrike.com.

Related Content