Creating Kusto sub-queries

In the same way as other query environments, Kusto queries in Log Anaytics can become complex. We need similar features in Kusto as we have in SQL Queries and one of these features is sub-queries.

The Problem

On the example below I’m building a query over my blog’s Log Analytics Data to identify the amount of access to my blog.

Log Analytics register the IP’s of the users making access to the blog. Web Site Access 101: When analyzing the access to a website, we have requests and sessions. Every time a user access a page, this is a request. While navigating the site, the user can access many pages, this is a navigation session.

The problem is that we usually have some outliers that increase a lot the number of requests. These usually are indexing tools requesting our pages to index and these indexing tools hide the real number of user requests our site receives, mixing it with indexing requests.

In order to make calculations with the real number of requests we need to discover the outliers IP addresses and remove them from the calculation. It’s basically two queries in one: Find the outliers and make the calculation removing the outliers. In other words, a sub-query.

Building the sub-query

In Kusto, sub-queries have some similarities with CTEs: We use the statement LET to define a name for a sub-query. After that, we can user this query by name on our main query. As you may be imagining, we can create as many sub-queries as we would like in a single Kusto query.

The rule to find outliers is a choice in each case. In my example, I will consider an outlier any IP address with more than 100 requests in a single day. Let’s recover the list of these IP addresses:

let outliers=
    AppServiceHTTPLogs
        |summarize Total=count() by CIp,bin(TimeGenerated,1d)
        |where Total > 100
        |project CIp;

Most of the details of this sub-query are just some Kusto syntax rules:

1) The query is called outliers
2) We are totaling the calls by Ip in a 1 day interval. The bin statement establishes the time-frame
3) Any Ip with a total of more than 100 requests will be listed
4) The query needs to finish with a semi-colon because it’s a sub-query

Using the sub-query

Let’s use our sub-query in a second query. This second query is totaling the requests by day of the week:

AppServiceHTTPLogs
    |where CIp !in (outliers)
    |summarize Total=count() by dayofweek(TimeGenerated),Week=bin(TimeGenerated,7d)
    |project Day=format_timespan(Column1,’d’),Week,Total
    |order by Day asc
    |project Day,Week,Total

1) The first step is to exclude the outliers IPs from the calculation
2) The summary is using dayofweek Kusto function and the bin as usual, but providing a field name for the bin result
3) The dayofweek function returns a time span, we still need to format it using format_timespan function. Since we didn’t provide a name for the dayofweek result on the summarize, it’s called Column1
4) Finally we order and get the final fields

 

The Final Query

These two queries need to be used together, one is a sub-query for the other (or CTE, if you prefer). The final result is this:

let outliers=
    AppServiceHTTPLogs
        |summarize Total=count() by CIp,bin(TimeGenerated,1d)
        |where Total > 100
        |project CIp;
AppServiceHTTPLogs
    |where CIp !in (outliers)
    |summarize Total=count() by dayofweek(TimeGenerated),Week=bin(TimeGenerated,7d)
    |project Day=format_timespan(Column1,’d’),Week,Total
    |order by Day asc
    |project Day,Week,Total

References

Overview of Kusto Queries

Kusto LET Statement

Conclusion

The KQL language is very powerful and flexible to support our needs