Top LogScale Query Functions for New Customers

New to Falcon LogScale? Start with these commonly-used query functions
This guide will introduce you to Falcon LogScale most commonly-used functions through a series of scenarios. At the end of each scenario, you’ll be asked to consider how these functions could apply to your data — and get the chance to write a few queries yourself. Let’s get started!
- timeChart()
- window()
- avg()
- min()
- max()
- bucket()
- eval()
- select()
- groupBy()
- table()
- sort()
- case
- count()
- tail()
- selectLast()
- sankey()
Scenario 1: Working with timecharts
Imagine you’re a mobile application developer for a financial services company and you’ve been asked to prepare a weekly status update for your leadership team. You want to report on the minimum, maximum and average amount of data transferred through your mobile app at different times over the past week. Here’s how:
First, filter your field columns with the word “byte” to see fields relevant to network traffic:
Here you discover two fields you want to report on: orig_bytes
and resp_bytes
. Removing the filter, you also discover one additional field you want to report on: duration
.
You can start with avg(“orig_bytes”)
to get the average of the original bytes over the selected time period (in this case, the last 7 days) as follows:
However, instead of seeing an average over the week, you wish to see how the average orig_bytes changed throughout the week. Try a timechart:
timechart (function=avg(orig_bytes))
Hovering over my chart, I can see the bucket size is one hour. This makes sense, since I have 169 buckets, and 24 hours/day x 7 days/week is 168 hours in my one-week chart.
What if I wanted to change the number of buckets and hence the time span of each bucket? We can add the span
command to our timechart query:
timechart (function=avg(orig_bytes), span=12)
In talking with your manager, you decide the best span to show would be 10 minutes. But when computing the avg
, min
and max
values, you want to compute across three 10-minute windows. You can do this with the window command inside our timechart query, as shown below. You’ll see we’ve combined our avg
, min
and max
functions into a single timechart.
Another way of achieving the same result would be to remove the window function and set the span to 30m. We used the method below to introduce the window function. In practical situations, the window function is used to look across buckets along with something else. For example, you could take the std dev
from the last three buckets and see if the current bucket is above or below that, thus detecting outliers.
timechart( function=window( function=[avg(orig_bytes),min(orig_bytes),max(orig_bytes)], buckets=3), span=10m)
Here we queried the orig_bytes
field, but we can also create similar timecharts for the duration
and resp_bytes
fields. We could save each of the timecharts as widgets on a dashboard to use for our weekly meeting. This simple query introduced us to numerous useful functions: timechart
, window
, bucket
, avg
, min
and max
. How could you use these functions when querying and displaying your own production data?
Next, let’s assume you want to see the values in kilobytes (kb) instead of bytes. We can create a new field by performing some math on an existing field using eval
:
eval (sizeInKb=resp_bytes / 1000)
Activity: Take a moment to write a query using timechart
, window
, bucket
, avg
, min
and max
with some of your data. Try different values for bucket and span. How does this change the look of the timechart? Play with the formatting of your chart by selecting the paintbrush icon on the right to expand the format panel. From there, change the y-axis scale from linear to logarithmic. How does this change your chart?
Scenario 2: Building tables
Imagine you’re an e-commerce developer and you just received a Slack notification about a spike in customer complaints that started 30 minutes ago. All book inventories on your company’s e-commerce site are showing “Unavailable,” and your website’s online checkout is not working.
Our first thought is to inspect HTTP status codes, as these can indicate what type(s) of errors were seen, their frequency and when they began.
First we browse our fields and observe the method
field. By clicking it, we see values of GET, PUT, POST, HEAD, OPTIONS, DELETE and PATCH. This indicates the HTTP operation that was requested. We make a mental note that we’ll be interested in GET operations, as those are when customers are trying to load a website.
Which field will show the errors? Continuing to browse the fields list, we see the statuscode
field. Clicking it shows a range of values: 201, 404, 500, 400 and more. These are the HTTP status codes we’re interested in. But we know that the 400 and 500 series of HTTP status codes represent client and server-side errors. So we begin with a simple query:
method = GET | statuscode >=400
From here, we might want to pull out just the statuscode
and timestamp
field for each event, instead of reviewing each log line. We can do that with a select query:
method = GET | select([“statuscode”, “@timestamp”]) | statuscode >=400
From here, what if we group the log lines into categories by statuscode
. This will allow us to see which statuscode
was most frequent. We use the groupby
function.
method = GET | statuscode >= 400 | groupby(statuscode)
To display a table of URLs which return an HTTP status code of 400 or higher, we can use the table command below. The first three lines (method
, statuscode
and books
) serve as filters to ensure we’re looking at only the most relevant logs meeting all three criteria.
Method = GET | statuscode >= 400 | books | table([“url”,”@timestamp”,”method”])
Next, we can sort by timestamp
as follows. You can experiment with other ways to achieve the same result. For instance, the table function has a sortby
argument that defaults to timestamp
as well.
statuscode>=400 | method=GET | books | table(["url","@timestamp","method"]) | sort("@timestamp")
This scenario taught us how to use select
, sort
, groupby
and table
.
Activity: Log into LogScale and create a table with some of your data. Which fields did you capture? Which field did you sort by? Why?
Scenario 3: Visualizing suspicious activity
You’re a security engineer investigating some suspicious files that have been sent to a few machines in your network. Imagine you receive an alert that 213.155.151.149
is the IP of a known threat actor, and you want a field that quickly alerts you when a system logs a specific IP. You could use a case statement below to create a new field and assign it a value of true when the IP is involved and false when the suspicious IP is not involved.
case {tx_hosts[0] = "213.155.151.149" | knownthreat := true; * | knownthreat := false}
To check our work, we use the select and sort functions we covered earlier:
case {tx_hosts[0] = "213.155.151.149" | knownthreat := true; * | knownthreat := false} | select(["tx_hosts[0]","knownthreat"]) | sort(knownthreat)
You’ll notice that all tx_hosts[0]
with an IP of 213.155.151.149
show the knownthreat
value of true, whereas any other IP addresses, or even null IP addresses, show a value of false – success!
Let’s say we want to count how many times a known threat actor has been transmitting into our network. We could use the count command as follows:
case {tx_hosts[0] = "213.155.151.149" | knownthreat := true; * | knownthreat := false} | groupby(field=knownthreat, function=count())
Imagine you want to focus on the 10 most recent events where this suspicious IP address was involved. You could use the tail()
function with a value of 10:
case {tx_hosts[0] = "213.155.151.149" | knownthreat := true; * | knownthreat := false} | knownthreat = true | tail(10)
Which produces the following:
An alternative to tail()
is selectLast()
. Here we only want the last, or most recent, value of a given field. Let’s use this to see the most recent transmitting and receiving hosts for both our suspicious IP and a non-suspicious IP address:
case {tx_hosts[0] = "213.155.151.149" | knownthreat := true; * | knownthreat := false} | groupby(knownthreat, function=selectlast(["tx_hosts[0]","rx_hosts[0]"]))
To finish this scenario, let’s visualize the relationship between the machines sending and receiving this suspicious traffic. To do this, we’ll use a sankey diagram, where all you need to do is supply the source and target, per the example below:
#path = "files" and mime_type = "*flash" | sankey(source=rx_hosts[0],target=tx_hosts[0])
That’s it! In this scenario we covered case statements, sankey diagrams, counting the number of events and filtering for only the last “x” number of events with tail()
or the most recent values for given fields using selectLast
.
Now it’s time for you to go hunting for suspicious activity within your network traffic using the strategies we employed above. If you’re not a LogScale user, all logs used in these scenarios come from either our in-product tutorial or Falcon LogScale Community Edition. Check these out if you want to practice with our datasets.
We hope you enjoyed this guide and we welcome your feedback at humiosuccess@crowdstrike.com.
