How to query summarized versus raw data from Kusto queries | Quisitive
How to query summarized versus raw data from Kusto queries
January 24, 2020
Cameron Fuller
The data warehouse contained aggregates of the data so that the data would not be too large to store.

Back in my SCOM days, we had a database and a data warehouse. The database held live data and the data warehouse contained longer term data. The data warehouse contained aggregates of the data so that the data would not be too large to store.

In Log Analytics we can have both the data itself and we can quickly generate aggregate data if that is the goal. There are times when the summarized data is more useful such as when you are using this data in Power BI. I ran into a situation where I needed to only send the summarized data to Power BI which I will discuss in example #2 in this blog but we’ll start with a common use-case first: CPU Utilization.

Example #1: CPU Utilization

In Log Analytics Microsoft now provides us some great pre-built queries so that we don’t have to re-invent the wheel. If you open Log Analytics and start with a blank query, there are pre-built sample queries based on the history of what you have done in the workspace plus other common ones around Computer availability, Computer performance and Data usage (as shown below).

There is a pre-built CPU usage trend over the last day query which is below:

123456// CPU usage trends over the last day// Calculate CPU usage patterns across all computers, chart by percentilesPerf| where ObjectName == “Processor” and CounterName == “% Processor Time” and InstanceName == “_Total”| summarize percentiles(CounterValue, 50, 90, 99) by bin(TimeGenerated, 1h)| render timechart

This query makes it easy to see what the CPU trend looks like over the 50th, 90th and 99th percentile. A sample output for this is shown below:

If you do a quick tweak to this query you can add the Computer’s name field which makes the graphic more busy (depending on how many computers you have) but also a little more useful. (The change below is the addition of Computer on the summarize line)

123456// CPU usage trends over the last day// Calculate CPU usage patterns across all computers, chart by percentilesPerf| where ObjectName == “Processor” and CounterName == “% Processor Time” and InstanceName == “_Total”| summarize percentiles(CounterValue, 50, 90, 99) by bin(TimeGenerated, 1h), <strong>Computer</strong>| render timechart

Here’s what that looks like for the 50th percentile data which is pretty busy looking even for only one day’s worth of data.

However, if you look at the underlying data in the table view you will now see that there are a LOT of records as a result. In my environment with about 20 systems that’s 575 records in the last 24 hours. To provide a real view of data over a longer duration what we need to do is to specify the duration of the query and to provide the summarized data. If that was running for 13 months the total number of data points would be about a quarter of a million records (and you do not want to send a quarter of a million records to Power BI in my opinion).

NOTE: Kusto queries return, by default, up to 500,000 rows or 64 MB, as described in query limits. You can override these defaults by using Advanced options in the Azure Data Explorer (Kusto) connection window [From https://docs.microsoft.com/en-us/azure/data-explorer/power-bi-best-practices]
To extend the duration of the query to run for a yearly basis we can define this in the query, and with another small tweak we can have the data summarized to once a day. The changes below include a filter on the dates (TimeGenerated >= StartDate and TimeGenerated <= EndDate).

123456let monthsAgo = -13;let StartDate = startofmonth(now(), monthsAgo);let EndDate = ago(1days);Perf| where<strong> TimeGenerated >= StartDate and TimeGenerated <= EndDate</strong> and ObjectName == “Processor” and CounterName == “% Processor Time” and InstanceName == “_Total”| summarize percentiles(CounterValue, 50, 90, 99) by bin(TimeGenerated, 1d), Computer

The simple change of altering the bin statement from 1h to 1d effectively clears this data up. Below is an example for the same query run in an environment with a short retention period but summarized to daily:

The same concept would work well for low disk space conditions or any other metrics which are gathered by Log Analytics.

Example #2: Custom data

In many of our customer environments we do data collection to gather information for historical reporting including patch management compliance, Windows versions and Endpoint health. An example query we use for reporting to Power BI is shown below.

1234567let monthsAgo = -13;let StartDate = startofmonth(now(), monthsAgo);let EndDate = ago(1days);LaunchReporting_CL| where TimeGenerated > StartDate and Report_s == “UpdateStatus” and ClientId_d > 0| project TimeGenerated, Value_s, ClientId_d, Count_d| sort by ClientId_d, TimeGenerated[/dt_code]

By adding these three lines to the query we can summarize the data so that instead of reporting daily data we’re reporting weekly data without effectively changing the original query. The first line creates the Aggregation field (which you can name anything you like of course). The second does the summarize like we did in the first example, but we are summarizing for the new Aggregation field plus the fields that we want in the final query (in this case the Client’s ID number, the Value we are running the query for and of course the count itself). The third line changes back the names so that they are still named the same as they were before these lines were added by using project-rename.

123| extend Aggregation = bin(TimeGenerated,7days)| summarize toint(avg(Count_d)) by Aggregation, ClientId_d, Value_s| project-rename TimeGenerated = Aggregation, Count_d = avg_Count_d

The results of this updated query give weekly summaries (see 1/5, 12/29, 12/22 and 12/15 as examples).

When this new data is added to Power BI we can see the difference in how it’s visualized. The first one is showing a daily aggregation.

The second one is showing a weekly aggregation which over time will now grow to showcase a full years’ worth of data in a single Power BI report.

Summary: The two examples above have shown how you can easily take an existing set of data in Log Analytics and instead of representing each point of data you can summarize that data.

This approach is very useful when you are getting this data into other systems like Power BI for report visualization. Additionally, in theory you could use a re-writer to take data from one Log Analytics workspace and write it to another Log Analytics workspace which is dedicated to providing long-term storage. But that is a blog post for another day ?