One of the most common requests in the world of monitoring is to be alerted before a disk runs out of space. In SCOM we can alert when a disk is getting low on space and we can provide reports or dashboards which provide insights into what disks are getting low on space. But what about when you aren’t using SCOM – when you are using Azure Monitor and potentially Log Analytics? This blog post will take a simple example of “How can I visualize free disk space” by showcasing the various options which are available in Azure as well as via 3rd party solutions.
Historically the focus on free disk space has been to avoid conditions where a disk runs out of space (it’s far better to know when a SQL database is about to run out of disk space versus right after a SQL database ran out of disk space). This condition is commonly referred to as an under-allocated resource.
With the shift to the cloud, there is also a push towards identifying systems that have been given too much in the way of resources such as a system that was allocated too much disk space. We care about over-allocation in the cloud more than we could commonly care on-prem because we are being billed for resources we have allocated in the cloud monthly. IE: I don’t want to pay for 1 TB of disk space monthly if I am using 1 GB of disk space on the drive. These resources are referred to as over-allocated.
To complify this further (yes, I know it’s not a word but I’m still claiming it), what people want in a dashboard can vary greatly. Some want to know only the most recent state of the drive, others want to know the history and trend of the drives. Some want to know the state of all drives in the environment, others are only interested in the top X. So now we have over-allocated and under-allocated, historical and near-real-time all as options for this same question. Plus, we have a variety of methods available to visualize this information.
Finally, we also have the option to visualize in dark mode or not in dark mode in Azure or in the 3rd party solution I am discussing in this blog post. The visualizations shown in this blog post are NOT in dark mode as I do not like dark mode.
Visualizing Free Disk Space in Azure
In Azure, there are four methods that I am aware of to visualize data from Log Analytics as a source of data and provide those visualizations on an Azure dashboard:
Log Analytics Kusto queries: Kusto queries of Log Analytics data can be written and then pinned to an Azure dashboard. An example of this is shown below: (showcasing a set of drives which may be over-provisioned)
log analytics visualization
Workbooks using Kusto queries: Workbooks are a new method to visualize Log Analytics data. An example of this is shown below: (showcasing a set of drives which may be over-provisioned)
workbook visualization
Metric-based visualizations: Specific metrics are gathered as metrics in Azure. An example of a metric based visualization for free disk space over time is shown below:
workbook visualization
Another example of this across multiple machines with pre-built metrics is showcased below which is available directly in the existing Performance Analysis dashboard (not pinned to an Azure dashboard).
workbook dashboard
Within workbooks, you can do extremely powerful visualizations by using the “Column Settings” option shown below.
Configuring workbook dashboard
In this example, we can pick an existing column and render it as a bar which goes from green to red.
Configuring workbook dashboard
This adds a bar for the CounterValue field shown below with changes in color based on the value.
workbooks in Azure dashboard
Overall, workbooks are extremely powerful especially when you get used to how to use the Column Settings section of the workbook. Thank you to Tony N who pointed this out to me and walked me through a use-case!
OMS console dashboards (Legacy): At the beginning of Log Analytics there was an existing console that was available, and that console allowed you to develop your dashboards. Within these dashboards, you could pin the top-level view (shown below) for these dashboards.
legacy Log Analytics dashboard
These top-level dashboards could be drilled into and provide details such as the one shown below which shows free disk space over time.
legacy Log Analytics dashboard
While these dashboards are interesting, it’s not possible to pin the sub-dashboards to an Azure dashboard and this technology is now legacy so I would not recommend it.
Summary of Azure-based options: While there are several different approaches available to visualize this type of data in Azure, workbooks are the most robust option currently available. To paraphrase a colleague of mine: “Workbooks are the best for Log Analytics because they are free, the others are not or require on-premises infrastructure.” (Thank you, Stan!)
One of the biggest challenges on Azure dashboards as a solution is that the dashboard itself can only refresh every 30 minutes. This causes many organizations to have to look at other options.
Data visualization outside Azure in Microsoft solutions
Outside of Azure but still using Microsoft technologies we also have Power BI. Power BI can directly connect to a Log Analytics workspace to ingest data and visualize that data. The example below is a simple one that visualizes free disk space based on the results of the query made to Log Analytics.
Power BI dashboard
In this example, the field (CounterValue) is formatted with conditional formatting turned on.
Power BI Configuration
For this configuration, I defined the background color changing from yellow to red based on the value of the field (the example below thresholds were set to show the colors, not actual values). For free disk space, a good threshold for Red would be 90%, Yellow at 80% and anything under that as Green.
Power BI Configuration
Challenges with scheduling updates to the data in Power BI. Based on my experiences you can schedule Power BI to update either daily or weekly. Within the daily, it can update up to 9 times a day (shown below) which is useful for many use cases but not for data that may be updated on an hourly basis.
Power BI Scheduling
One of the biggest challenges on Power BI as a solution is that the data can only refresh up to 8 times a day with Power BI Pro. With Power BI premium you can refresh up to 48 times a day (or ever 30 minutes on a 24-hour basis, or every 10 minutes during a focused set of business hours). This restriction has caused us to look at other options for data which needs to refresh more frequently.
Summary of Power BI options: Power BI gives some very powerful data visualizations with the major challenge being the ability to refresh data on a scheduled basis.
Data visualization using 3rd party solutions connecting to Azure data
With SquaredUp being the sponsor for the DFWSMUG on March 10th, I was able to spend some time kicking the tires on their solution for Azure. For those not familiar with SquaredUp, they have provided dashboard solutions over SCOM for a while now and they now have a solution for Azure as well.
SquaredUp can provide visualizations from a variety of sources including Log Analytics. In the example below, the field of state was provided by the Kusto query but the results of that field are then color-coded using this configuration as part of the grid columns with a custom template.
SquaredUp configuration
The value I have for the custom template is shown below (thank you to SquaredUp support for this example!)
{{#if (value == ‘Healthy’)}}<img src=’https://demo.squaredup.com/images/healthy.png’>{{elseif (value == ‘Potentially Overallocated’)}}<img src=’https://demo.squaredup.com/images/warning.png’>{{else}}<img src=’https://demo.squaredup.com/images/critical.png’>{{/if}}
The result is an easy to understand visualization for the state of drives based on the value provided for the counter (in this case free disk space).

SquaredUp contains a variety of pre-built dashboards such as the one below for VM details.

Or the one below to show the top drives with low disk space (pre-built as well).

If you want to kick the tires on this solution, they have a publicly available version of it available at https://demo.squaredup.com/.
Summary of SquaredUp: SquaredUp gives you the ability to create powerful custom dashboards and gives you pre-built dashboards for common requirements. If you need capabilities like these you may want to check them out.
Additional readings:
For another great view of visualization options available check out this article from the docs team.
Summary
Everything is changing and evolving but there are a few recommendations I would make based on the current state of visualization for data in Azure.
- Azure dashboards built on top of workbooks make a good solution for a lot of environments. However, this may not be sufficient if you need the data on the dashboard to update more frequently than every 30 minutes.
- Power BI works well to provide solutions such as monthly (or weekly) reports based upon historical data.
- If you have requirements that cannot be covered by either of the options above, there are excellent 3rd party solutions out there such as SquaredUp.
Thank you to Sean T for this idea on this blog post!
Microsoft’s Cognitive Services is a grab-bag of amazing capabilities that you can purchase by the transaction. The Cognitive Services APIs are grouped by vision, speech, language, knowledge and search. This article is about using the Text Analytics API (in the language group) to score the sentiment and detect the topics of a large number of text phrases. More specifically, the scenario we’ll explore is that we’ve been given a file containing thousands of responses to a survey conducted in Facebook, and our client would like to know the sentiment of the text comments, the topics mentioned by those text comments, and the frequency of those topics.
The text analytics API is well documented by Microsoft. The request body has to have exactly this format:
{ "documents": [ { "id": "string", "text": "string" } ] }
For example:
{"documents":[{"id":"1190","text":"thank you!"},{"id":"1191","text":"i thought it was perfect, as is."},{"id":"1783","text":"more tellers on certain busy days ,for example mondays."},]}
In practice, I found that the API would accept request bodies up to about 64kb. So, if you have more than 64kb of comments, you have some looping your future. Here are the imports. (There may be one or more here that aren’t needed. This was a long project.)
import urllib2
import urllib
import sys
import base64
import json
import os
import pandas as pd
Assuming our input text file is formatted as a CSV file, we import it into a Pandas dataset.
fileDir = 'C:/Users/Administrator/Documents/'
fileName = 'SurveySentiment.csv'
mydata = pd.read_csv(fileDir + fileName,header = 0)
mydata.head() # look at the first few rows
First, let’s add a new column to our dataset containing a cleaned version of the survey comment text.
import re
#%% clean data
def clean_text(mystring):
mystring = unicode(str(mystring), errors='ignore') # professional developer, don’t try this at home
mystring = mystring.decode('utf8') # change encoding
mystring = re.sub(r"\d", "", mystring) # remove numbers
mystring = re.sub(r"_+", "", mystring) # remove consecutive underscores
mystring = mystring.lower() # tranform to lower case
mystring = mystring.replace(" "," ")
return mystring.strip()
mydata["Comment_cleaned"] = mydata.Comment.apply(clean_text) # adds the new column
Now that we have the data in a form we can iterate through, we need to feed it into a structure that can be sent to the text analytics API. In this case I broke the input into ten equally sized segments to keep each request body under 64kb.
input_texts = pd.Series() #init a series to hold strings for submission to API
num_of_batches = 10
l = len(mydata)
for j in range(0,num_of_batches): # this loop will add num_of_batches strings to input_texts
input_texts.set_value(j,"") # initialize input_texts string j
for i in range(j*l/num_of_batches,(j+1)*l/num_of_batches): #loop through a window of rows from the dataset
comment = str(mydata['Comment_cleaned'][i]) #grab the comment from the current row
comment = comment.replace("\"", "'") #remove backslashes (why? I don’t remember. #honestblogger)
#add the current comment to the end of the string we’re building in input_texts string j
input_texts.set_value(j, input_texts[j] + '{"id":"' + str(i) + '","text":"'+ comment + '"},')
#after we’ve looped through this window of the input dataset to build this series, add the request head and tail
input_texts.set_value(j, '{"documents":[' + input_texts[j] + ']}')
Sentiment Analysis
Okay, now we have a series of ten strings in the correct format to be sent as requests to the sentiment API. Let’s loop through that series and call the API with each of the strings.
# Azure portal URL.
base_url = 'https://westus.api.cognitive.microsoft.com/'
account_key = '00e000e000f000b0aeb0000b0edec0000' # Your account key goes here.
headers = {'Content-Type':'application/json', 'Ocp-Apim-Subscription-Key':account_key}
num_detect_langs = 1;
Sentiment = pd.Series() #initialize a new series to hold our sentiment results
batch_sentiment_url = base_url + 'text/analytics/v2.0/sentiment'
for j in range(0,num_of_batches):
# Detect sentiment for the each batch.
req = urllib2.Request(batch_sentiment_url, input_texts[j], headers)
response = urllib2.urlopen(req)
result = response.read()
obj = json.loads(result)
#loop through each result string, extracting the sentiment associated with each id
for sentiment_analysis in obj['documents']:
Sentiment.set_value(sentiment_analysis['id'], sentiment_analysis['score'])
#tack our new sentiment series onto our original dataframe
mydata.insert(len(mydata.columns),'Sentiment',Sentiment.values)
This is what the head of the dataframe looked like at this point in my project:
Now, just save our dataframe with sentiment to a file:
mydata.to_csv(Sentiment_file)
Topic Detection
The topic detection API looks through a set of documents (like our comments), detects the topics in those documents, and scores the topics by frequency.
We still have our input_texts series containing ten strings in the format:
{ "documents": [ { "id": "string", "text": "string" } ] }
The request format for the detect topics API is a superset of the above format, but the format we have will work just fine if we don’t need to specify stop words or exclude topics.
{ "stopWords": [ "string" ], "topicsToExclude": [ "string" ], "documents": [ { "id": "string", "text": "string" } ] }
So, let’s iterate again through our series, calling the topic detection API this time:
import time
# Simple program that demonstrates how to invoke Azure ML Text Analytics API: topic detection.
headers = {'Content-Type':'application/json', 'Ocp-Apim-Subscription-Key':account_key}
TopicObj = pd.Series()
for i in range(0,num_of_batches):
# Start topic detection and get the URL we need to poll for results.
print('Starting topic detection.')
uri = base_url + 'text/analytics/v2.0/topics'
req = urllib2.Request(uri, input_texts[i], headers)
response_headers = urllib2.urlopen(req).info()
uri = response_headers['operation-location']
# Poll the service every few seconds to see if the job has completed.
while True:
req = urllib2.Request(uri, None, headers)
response = urllib2.urlopen(req)
result = response.read()
TopicObj.set_value(i,json.loads(result))
if (TopicObj[i]['status'].lower() == "succeeded"):
break
print('Request processing.' + str(time.localtime()))
time.sleep(10)
print('Topic detection complete.')
This API takes some time. For my 4471 row dataset it took over half an hour. It returns a JSON string. Here’s how I used Pandas to process it:
TopicDF = pd.read_json(json.dumps(TopicObj[0]))
for i in range(1,num_of_batches):
TopicDF.append(pd.read_json(json.dumps(TopicObj[i])))
Then, to extract the ids, scores and key phrases out of that dataframe:
Topics_id = pd.Series()
for i in range(0,len(TopicDF.iloc[2][1])):
Topics_id.set_value(i,TopicDF.iloc[2][1][i][‘id’])
Topics_score = pd.Series()
for i in range(0,len(TopicDF.iloc[2][1])):
Topics_score.set_value(i,TopicDF.iloc[2][1][i][‘score’])
Topics_keyPhrase = pd.Series()
for i in range(0,len(TopicDF.iloc[2][1])):
Topics_keyPhrase.set_value(i,TopicDF.iloc[2][1][i][‘keyPhrase’])
Topics = Topics_id.to_frame()
Topics.insert(len(Topics.columns),’score’,Topics_score.values)
Topics.insert(len(Topics.columns),’keyValues’,Topics_keyPhrase.values)
Topics.rename(columns= {0:’id’}, inplace=True)
Topics.to_csv(Topics_file) # write out the topics to a file
Now, to tie the topics back to individual comments:
TopicAssignments_documentId = pd.Series()
for i in range(0,len(TopicDF.iloc[1][1])):
TopicAssignments_documentId.set_value(i,TopicDF.iloc[1][1][i][‘documentId’])
TopicAssignments_topicId = pd.Series()
for i in range(0,len(TopicDF.iloc[1][1])):
TopicAssignments_topicId.set_value(i,TopicDF.iloc[1][1][i][‘topicId’])
TopicsAssignments_distance = pd.Series()
for i in range(0,len(TopicDF.iloc[1][1])):
TopicsAssignments_distance.set_value(i,TopicDF.iloc[1][1][i][‘distance’])
TopicAssignments = TopicAssignments_documentId.to_frame()
TopicAssignments.insert(len(TopicAssignments.columns),’topicId’,TopicAssignments_topicId.values)
TopicAssignments.insert(len(TopicAssignments.columns),’distance’,TopicsAssignments_distance.values)
TopicAssignments.rename(columns= {0:’documentId’}, inplace=True)
TopicAssignments.to_csv(TopicAssignments_file)
Usage
Now that we have the sentiment of each comment, the topics across all comments, the frequency of each topic, and the association of each topic back to its comments, it’s readily feasible to pull these CSVs into PowerBI and build reports on them.
Good luck!