Quisitive recently hosted the Modern Data Culture Summit. This virtual event spanned two days and helped participants gain an understanding of what a Modern Data Culture is, and how to effectively adopt it within their organizations. With businesses having more data than ever, learning how to use data efficiently has become a crucial step to keeping organizations at the top of their game.
Microsoft’s Erwin Visser, Sr., Director Partner Strategy and Capacity, kicked off day one of the Summit as the Keynote Speaker. The presentations that followed outlined a framework for building a data culture, including sessions on measuring data maturity, citizen data developers, and data protection.
Day two of the summit featured ten unique sessions across two attendee tracks, one for business executives and one for those in technical roles. Topics included using Artificial Intelligence/Machine Learning methods to unlock business value from unstructured data, designing meaningful business metrics, using Machine Learning Ops to consistently deliver value, how to keep data projects from failing, and much more.
The recordings of the Keynote event and the following 16 sessions are now live on the Quisitive Modern Data Culture Learning Channel. Each session is a quick 45-minute opportunity for you to gain insights into what data can do and how it can transform your business. As an added bonus, if you complete just six sessions, you earn a Data Culture Champion badge-perfect for your LinkedIn page or resume.
Sessions include:
Top 5 Data Protection Tips: How to Secure Your Data: Secure your data and mitigate your security risk, plus introduction of Zero Trust as it applies to data.
Powering Smart Devices Using the Internet of Things: Collect and store data from IoT devices and use that data to gain insights and build models.
Design Meaningful Metrics to Measure Business Performance: Use data to choose the right KPIs to effectively measure your business outcomes.
Data Project Fails and How to Avoid Them: What to expect when working on data projects, and how to set reasonable expectations to avoid failure. The Rise of the Citizen Data Developer: Empower your team to be Citizen Data Developers. Learn how to empower a data non-specialist to use and leverage data, analytics, and AI.
Present Your Data to Support Your Business Case: Discover new ways to visualize, present and talk about data to persuade decision makers.
Watch Erwin Visser’s Turning Dreams Into Digital Business opening session HERE.
For complete access to all these sessions and more, register for the Modern Data Culture Learning Channel HERE.
Thanks!
This post should serve to inform you as a manager or executive what to expect from analysis presented in Microsoft Power BI. This primarily non-technical post will cover terminology, design, visuals, data connections and updates, and user empowerment.
Terminology
Desktop, Service, Mobile App:
The Power BI product has three primary components: Desktop, Service and Mobile App. Executives are more likely to use the Power BI service or Mobile App than Power BI Desktop. Here is a quick summary of the differences between these alternatives:
Power BI Desktop is a traditional Windows application like Excel, Word, PowerPoint and Outlook. It is the primary authoring environment used by analysts and developers to create Power BI reports and the visuals they contain. It can also be used to view and interact with Power BI reports. In Desktop, a business intelligence specialist can build a sophisticated data model by connecting to one or more data sources, relating tables, creating calculations, splitting or combining fields, etc… What is Power BI Desktop missing? The ability to build or view a “dashboard” and the ability to share your analysis. Dashboards exist only in Power BI Service, and cannot be authored or viewed in Power BI Desktop. To share what has been authored in Power BI Desktop, you could copy it to a fileshare or SharePoint site like we’ve been doing since Lotus 1-2-3 was a thing (and try to keep up with all the versions that grow out of that copy), but a far better way is to publish the report to the Power BI Service.
Power BI Desktop
The Power BI Service resides at https://powerbi.microsoft.com. It is a software-as-a-service product connected to your Office 365 account. The primary purpose of the Power BI Service is sharing insights. A Power BI Service user can:
View and edit dashboards (with appropriate permissions).
View, edit, author and interact with reports (with appropriate permissions).
View reports and dashboards published as an “App” (with appropriate permissions). More on this later.
Author reports from scratch by connecting directly to data sources, but to a more limited degree than with Power BI Desktop.
Report in Power BI Service (https://powerbi.microsoft.com)
Dashboard in Power BI Service (https://app.powerbi.com)
The third Power BI client is the mobile app available from the Apple, Google Android and Microsoft Windows app stores. It is a touch-optimized application that enables viewing Power BI dashboards. In addition, the mobile app allows you to set data alerts and view visuals based on cached data while offline.
Power BI Mobile App (Windows, iOS, Android)
Visual:
A chart, graph, table, or other graphic depiction of data and data relationships. More about visuals below.
Report:
A report in Power BI is analogous to a workbook in Excel. Reports can have one or more tabs.
Dashboard:
A dashboard is a canvas on which visuals are “pinned” from reports. A dashboard is well named; its purpose is to enable at-a-glance views of business metrics. Clicking through any of the dashboard visuals will open the report from which the visual was pinned, so that the user can explore the context of the visual. As stated above, dashboards exist in powerbi.com and the Power BI mobile apps, not in Power BI desktop.
One killer feature that can be enabled in Power BI dashboards is “Q&A”. This feature enables a user to type (or speak, with Microsoft Cortana integration) natural language questions and receive charts, graphs and maps in response. Here I typed “count of violations by type” and Power BI presented me with a bar chart. I then changed “type” to “Zip Code” and it drew me a map.
Power BI Q&A
App:
Unfortunately, there are two completely different uses of the term “app” in the Power BI world. First, there is the above-mentioned mobile app that installs from an app store on an iPhone or other device. Second, there is the concept of a group of reports and dashboards published together in the Power BI Service. These “Apps” make it easy for your IT organization to publish and maintain secure access to groups of reports.
Visuals
Power BI currently (as of October, 2017) ships with 29 visuals. One of its many killer features is the fact that this visual library is extensible. Any developer can build and submit visuals for Microsoft to evaluate and publish to their online store, and many have. A quick count showed 113 custom visual offerings in the store, ranging from key performance indicators to gauges, maps, temporal charts, filters, etc. Most of these custom visuals are license free.
Providing Feedback on Visuals
With such a vast selection of visuals from which to choose, it can be tempting for a Power BI report developer or analyst to use as many as possible. In many cases, a custom visual can provide insight or “tell a story” better than any of the out-of-the-box visuals. However, the best report design advice is always “simpler is better.” Keep that in mind if you have the opportunity to provide report design feedback. Report design should prioritize the conveyance of important business information with a minimum of distraction. More on design later.
Report developers should use specific visual types to convey specific data insights. There are many perspectives on this topic, including a great illustration by Andrew Abela.
Clustered bar or column charts are best for comparisons.The ribbon chart, new as of September, 2017, adds visual queues that show ranking trends over time.
Ribbon Chart (http://bit.ly/2y92dDc)
Line charts and sparklines are best for reflecting change over time.
To depict the contribution of parts to the whole, stacked bars or columns are usually better than pie charts. Pie charts can be used with very small numbers of categories, but designers should consider waffle charts for simple comparisons between larger numbers of categories. Heat maps can be a useful way to convey contribution of parts across multiple dimensions. Waterfall charts can also be a great way to express an additive metric that has many components, e.g. “fiscal year sales and gap to-date over several different licensing types”.
To express states of a flow, a funnel chart can be best. If the data flows from multiple sources to multiple destinations, a Sankey chart can provide a unique visual insight. An example of a good use of Sankey is “average daily quantity of customer support request by channel and by call center”.
Typically, histograms depict distributions the best. Box-and-whisker visuals are great for comparisons between distributions.
Power BI has several different ways to show spatial and geographic data. ArcGIS is probably the most powerful.
Scatter charts are best to show correlations, like call center expenditures versus calls handled. Notice that this one is augmented with a trend line and an explanatory label.
Scatter Chart
Single metrics, like a running monthly count of safety incidents, are best shown in a “card” visual. If that metric is a key performance indicator with a specific goal and trend, the KPI visual is best.
Report developers should never underestimate the power of a simple table. Showing key detail in-situ with the storytelling visuals can serve to answer questions efficiently. This isn’t mandatory, as users can always drill to data from a visual (click the ellipsis at the visual upper right, select “see data”)
Slicers enable interactive filtering.
When presenting controversial insights, or when presenting complex business metrics to experts, more raw data is better than more pretty visuals. Designers should rely in simple charts augmented with tables.
Another thought with regard to visuals and reports – keep in mind whether the information you are being presented is declarative or exploratory. Declarative visuals indicate statements of fact, e.g. “third quarter sales of potatoes on the western U.S. were down 7.6% year over year”, and they don’t change often. Report writers have time with these to apply design and polish. In contrast, exploratory visuals are the result of innovative thinking, usually by a team, about the business, e.g. “we compared tuber sales to dark green veg sales in the west, and found a negative correlation; we think the ketogenic diet trend is decreasing demand for starchy veg”. These visuals are usually less color coordinated, polished, and shiny than declarative visuals, because they are fresh. As an executive, focus your feedback on the content of exploratory visuals, not their beauty, to encourage quick and open sharing of innovative thought.
Interactions of Visuals
Power BI visuals can act as interactive report filters. For example, imagine clicking on the “Midwest region” column of a “Net Fiscal Quarter Sales by Region” column chart. Power BI will immediately filter the entire report tab to the Midwest region. This capability is on by default, and it is developer configurable to highlight rather than filter the report.
Report designers can use a visual called a “slicer” to give you an easy way to interactively filter a report. These have been around in Excel for quite a while. Often however, a small visual like a “call volume by call center” bar chart can be used as both an informational visual and a report filter. This can be a better use of screen real-estate than a slicer.
Drilldown with Visuals
The drilldown button: in the upper right corner of a data visual will toggle drilldown mode for that visual. With it off, clicking on a line or column will simply select and focus that segment of the visual, like this:
Toggle the drilldown button on, then click on a report data point, and the visual will drill down to the next level of the x-axis hierarchy, if one is available. In the visual below I have toggled drilldown on, then clicked on the “2015” data point of the VanArsdel line. The report is now focused on 2015.
In contrast, the “go to the next level in the hierarchy” button: in the upper left corner of a visual will do exactly what is says. It will focus the whole visual down one level of the x-axis hierarchy. For example, after clicking that button, the report below shows sales summed by month over all of the years in the database. It is clear from the visual below that June is the top sales month for VanArsdel over the past 17 years.
The “expand all down one level in the hierarchy” button: in the upper left of the visual will increase “grain” in the chart by including the next level of detail from the next lower hierarchy level, as in this example:
Focus Mode
Have someone looking over your shoulder while discussing a specific visual, or need to focus discussion one one specific visual during a meeting? Click on the “focus mode” icon () in the top right corner, and it will zoom to fill the canvas. This is a temporary display mode – nothing will happen to the layout of the rest of the report if you use this.
Data Connections and Updates
The most powerful feature of Power BI may be the “get data” button. It lists 76 data sources to which the product can connect out-of-the-box, and the data connector API is fully documented, so third parties can write their own connectors. Further, Power BI has its own on-board query processor, meaning it can consume data from multiple sources and combine the results. In the example below, a standard sales database has been augmented with population data to create a “sales per capita” calculation. This was done by screen-scraping a couple of Wikipedia pages (Population by US City and State Codes by State) from Power BI desktop, then relating the resulting tables to the existing geography table on city and state.
It is not always feasible or practical to combine multiple data sources, so please be understanding if your data geek says it can’t be done.
Streaming data can be shown real-time in a Power BI report. More traditional non-streaming data can be updated up to eight times daily to the Power BI Service, or forty-eight times daily on the new “Premium” Power BI product. A developer can easily place an “Updated on: [date] [time]“ card on a report tab to indicate the freshness of the data in the report.
User Empowerment
In Power BI Desktop, and to a certain extent in the Power BI Service, users are empowered to build their own reports and visualizations. Given some training and practice, users can answer their own questions and perform self-service research without involving their IT group.
This leads to a classic two-edged sword in business intelligence: multiple versions of the truth, also known as “Excel Hell.” As shown below, three different analysts’ sales models might define “Territory Sales Growth” in three different ways. Tony uses a denominator that includes last year’s prorated sales, while Pat uses last year’s total fiscal year-to-date, and Sri factors in adjustments that were made after the FY closed in both the numerator and the denominator. Which is correct? From which calculation should business decisions be made?
Multiple Versions of the Truth (FYTD = fiscal year to date)
Catapult Systems advocates a tiered system of report tagging in order to signal the degree of trust that should be attributed to a calculation in a report. This can serve to encourage data exploration and sharing while signaling an appropriate level of caution. We will be happy to tell you more about it in person.
Design
If you have the chance to specify the design of Power BI reports or provide feedback, here are some common rules of thumb.
Whitespace is valuable. Developers may feel the need to fill every square millimeter of space with information, and if you’ve asked for 35 visuals on a topic, that may be what they deliver. The usual design goal is clearly organized report with a limited number of impactful visuals surrounded by plenty of whitespace.
Consistency of layout is an asset in a multi-tab report. One common guideline is that who, what, how, when and why oriented visuals should each have their own consistent places on a report.
Report colors can be entirely customized, including the base color palet.
Branding and backgrounds can add to the professional look of a report.
Layouts can be customized for use on mobile devices as well as Power BI Desktop and Service.
Summary
The purpose of this article is to summarize Power BI for managers and executives being asked to use the product. It was intended to help you get smart quickly on the essential features of the product, and to enable you to give some feedback on report design. This is the most rapidly evolving major product I have ever seen out of Microsoft. There are many capabilities and technical features that were not covered here such as security, bookmarks, parameters, data models, DAX calculations, etc., that your report developers should understand and utilize. Now is a great time to be in the business analytics field.
Feedback always appreciated.
I had a couple of excellent questions asked with regards to the first blog post in this series which discussed using Power BI to gather information from an upgraded Log Analytics workspace.
“How do I authenticate to schedule a refresh of data in Power BI for my particular datasets from queries? I have the option and have tried all of them, all failed. Not sure what I’m doing wrong. Any pointers?” – My attempt to address this is in the “Scheduling the refresh of the Log Analytics data in Power BI” section.
“I see the limit or records is set at 2000. When I enter the amount of records I require (25 000) I get an “Error 502 (Bad Gateway) error in PowerBI as soon as I click “Done” in the advanced query editor. Is there a timeout I need to adjust here to allow a little longer for the data to load? Are there limitations on how many records I can query?” – My attempts to address this question are included in the “Updating the query with an increased # of records” and the “Updating the query to return only the one day of data” sections of this blog post.
Scheduling the refresh of the Log Analytics data in Power BI:
Once you have your query in place, you need to provide credentials as we discussed in the previous blog post. To schedule this data to update, we need to publish it next.
In my example I will publish it to my workspace in Power BI.
From here we switch to Power BI web (http://powerbi.microsoft.com) and open “my workspace” in the datasets section and find the name of the dataset which was published.
Use the “Schedule Refresh” option to set when you would like the data to update
From here I needed to edit my credentials and set them to Oauth2 to authenticate properly.
And then logged in my account credentials for the OMS workspace.
Now we can determine when we want this to update (either daily, or weekly):
You can also add more times to have it refresh on the daily basis.
Right-clicking on the dataset shows when the most recent refresh was and when the next one is scheduled to occur. You can also click on the ellipses and choose to “Refresh Now”.
Updating the query with an increased # of records:
The default query returns 2000 records. This value can be changed to a higher value depending on how large the records which you return. For my query I was able to update it to a value of 18,000 before receiving this error message.
Updating the query to return only the one day of data:
To minimize the amount of data, I changed from the original query approach to use a time restricted query which was updated on a schedule as discussed above. [Be aware that when cut and pasting this query the quotes may substitute incorrectly]
Original query: Perf | where CounterName == “% Processor Time” | summarize AggregatedValue = avg(CounterValue) by Computer
Time restricted query: Perf | where CounterName == “% Processor Time” and TimeGenerated >= ago(1d) | summarize AggregatedValue = avg(CounterValue) by Computer
To validate that the scheduling was working, I looked at the results on the “TimeGenerated” field. When the data was originally gathered, the TimeGenerated went up to 9/5/2017 2:00:00 PM
After a successful refresh of the data the TimeGenerated field shows more recent data.
(*Note: may have to close and re-open Power BI web to see if the data has refreshed*)
Below is the query which is provided by OMS for Power BI:
/*
The exported Power Query Formula Language (M Language) can be used with Power Query in Excel
and Power BI Desktop.
For Power BI Desktop follow the instructions below:
1) Download Power BI Desktop from https://powerbi.microsoft.com/desktop/
2) In Power BI Desktop select: ‘Get Data’ -> ‘Blank Query’->’Advanced Query Editor’
3) Paste the M Language script into the Advanced Query Editor and select ‘Done’
Summary: To schedule your Log Analytics data to update on a scheduled basis: publish, and then schedule the dataset to update on the schedule which you would like it to update. To avoid the “502 bad gateway” errors, limit the duration of time for your query so that it will return less data but run it more frequently. Thank you to Erik Skov who was my “Phone a friend” for this issue!
[Update 11/29/2017: With all workspaces being converted to the new query language, the original “non-converted workspace” sections of this blog post can be disregarded. I am leaving this blog post in place for historical purposes to show how this functionality changed with the new query language.]
Microsoft recently released the new query language for Log Analytics which I personally believe to be “the most significant update to the Log Analytics framework since the release of the HTTP data collector API.” The new query language greatly extends the capabilities of Log Analytics but it also opened the door to another large change which is a bit more subtle. In a non-converted workspace, you can enable a preview feature called “PowerBI Integration” which allows you to push data to PowerBI based on a search query you define in OMS (I’ve used this functionality in several blog posts including the following):
The ability to push data to Power BI was extremely helpful but it had two major limitations:
Since you were pushing data you couldn’t query directly from Power BI to gather your data as you do with other sources and there were limits as to how frequently these exports could be scheduled.
The data which was pushed was only available in the web version of Power BI – not in the desktop version of Power BI (which has more capabilities).
Digging into the Power BI functionality:
The screenshots below show a non-converted workspace and how the PowerBI Integration features.
Once you have converted your workspace, the option to enable the “PowerBI integration” no longer exists.
When you run a query in Log Analytics you see the same option available for PowerBI. The screenshot below is from a non-converted workspace.
The screenshot below adds the “Analytics” option and shows a converted workspace.
The difference is what happens which you hit the PowerBI button. In the non-converted workspace this raised a dialog box where you defined the query.
In the converted workspace, this creates a text file which you can save off. This text file contains both the query and the process to use the query in Power BI desktop.
An example of this export is shown below:
/*
The exported Power Query Formula Language (M Language) can be used with Power Query in Exceland Power BI Desktop.
For Power BI Desktop follow the instructions below:
1) Download Power BI Desktop from https://powerbi.microsoft.com/desktop/
2) In Power BI Desktop select: ‘Get Data’ -> ‘Blank Query’->’Advanced Query Editor’
3) Paste the M Language script into the Advanced Query Editor and select ‘Done’
To add this data, we open Power BI desktop and choose Get Data. From the Get Data list we choose the “Blank Query” shown below.
From this view we open the Advanced Editor.
Paste the query into the Advanced Editor view and click Done.
You may need to Edit your credentials to access the data.
(In my case I need to authenticate to my Organization account)
NOTE: It is recommended to use service principal for authentication otherwise might be issues when you change your personal account password.
Once we connect we can see the data in Power BI.
Then we close and apply this query.
From here you can build your visualizations you normally would. In my example this is “Query2” shown on the right side of the screen.
The screenshot below shows a simple visualization of the data which I requested via the query.
Summary: Converted workspaces in Log Analytics replace the preview feature approach which pushes data into Power BI with the ability to query data from Power BI. This approach also makes it possible to use the desktop version of Power BI which removes the limitations we had seen when working with the data we had pushed out to Power BI. In a follow-up blog post I will review other changes that I have found to the user experience after conversion of a workspace has occurred.
I was looking for a more accurate way to locate a top-level item in any hierarchy without having to do a lot of extraneous looping. The solution I found was is efficient and easy!
For this example, we will determine the top-level manager for any employee, regardless of the level of their managerial hierarchy. For this test, I have created a table called employees:
We can use the following method to determine an employee’s top-level manager and their relationship level:
USE [Test]
GO
DECLARE @EmployeeId int = 13 — Christian Lewis
;WITH EmployeeHierarchy
AS
(
SELECT
e1.EmployeeId
, e1.EmployeeName
, e1.ManagerId
, [Level] = 0
FROM
Employees e1
WHERE
(EmployeeId = @EmployeeId)
UNION ALL
SELECT
e2.EmployeeId
, e2.EmployeeName
, e2.ManagerId
, [Level] + 1
FROM
Employees e2
INNER JOIN EmployeeHierarchy ON e2.EmployeeId = EmployeeHierarchy.ManagerId
)
SELECT
*
FROM
EmployeeHierarchy e3
WHERE
e3.[Level] = (SELECT MAX([Level]) FROM EmployeeHierarchy)
Which will result in the following output:
Neat!
As the data in Microsoft SQL Server tables changes their indexes change. Over time these indexes become fragmented. This fragmentation will adversely affect performance. This post provides information to help you understand the detailed mechanics behind fragmentation. It will also help you understand the methods and approaches for performing defragmentation so you can improve your SQL Server’s performance.
The following is a summary of the key topics covered:
The difference between disk and SQL Server internal and external fragmentation
How fragmentation affects performance.
The mechanics behind performance robbing data voids
The pros and cons of various approaches to managing fragmentation
How to judge the improvements gained by defragmenting your server.
Q. What is SQL fragmentation? Is it different that disk fragmentation?
A. SQL fragmentation is not physical like disk fragmenation. Not all fragmentation is equal!
Physical disk fragmentation is likely what comes to mind when fragmentation is first discussed. Physical fragmentation is a side effect of how hard drives and Windows work. It is common knowledge that regular disk defragmentation is required to achieve optimal performance from your PC. Windows even includes a basic defragmentation utility.
Physical fragmentation slows down your PC because reading data is interrupted by head seek delay. Windows fits files into free space, often breaking the file into segments stored apart from one another. A hard drive’s head relocates to read each individual segment. As it moves to each segment the head ‘seeks’ – often at a cost of 3-4 times the time it takes to read the segment itself. Physical fragmentation primarily affects desktop or laptop PCs containing one hard drive. The single drive must sequentially gather data – so on a fragmented disk it seeks, reads, seeks, reads – these 4 operations are performed one after another. Defragmented, the operation ends up as seek, read, read. We reduce the total cost of 24ms to 15ms in our simple example.
Figure 1 – Sequential operations with on a typical PC
Shown above, in Figure 1, we see physical fragmentation forcing the drive to read the file as two segments. The seek costs us 18ms, while the read costs 6ms. Considering the average file is made up of hundreds of segments, the seek latency multiplies and becomes quickly evident as your system slows down over time.
Physical defragmentation products such as Windows defrag, Power Defrag™, Page Defrag™ (another Microsoft tool), or the granddaddy of them all, Diskeeper 2007™ work very well when repairing segmented files. Diskeeper’s technology is licensed to Microsoft as the defragmentation tool internal to Windows. In fact, Diskeeper’s latest innovations bring physical defragmentation capabilities to a completely new level. All of these products reorder the data on your disk, consolidating files into fewer segments to minimize “head seeks” – providing faster boot times, quicker file reads, and a more responsive system overall.
However, physical disk fragmentation is not the same as SQL Server defragmentation! SQL Server is different. SQL Servers use advanced storage systems with multiple drives working in tandem, changing the way files are read. Physical fragmentation is something solved with hardware – not with defragmentation scripts or tools.
Figure 2 – Parallel operation found in arrays
The fault-tolerance in database storage overcomes the vast majority of physical disk fragmentations’ impact. Best practices universally prescribe multi-drive storage subsystems for production SQL Servers. Most SQL Servers use multi-drive storage such as RAID arrays, SANs, and NAS devices; there are always multiple drives acting in tandem. Hard disk controllers supporting drive arrays are aware of the alternate seek/read dynamic and tailor communications with the array for maximum I/O.
As a result, files are distributed across many drives inherently becoming segmented. Working in tandem, however, allows one drive to seek while the others read. With the common configuration of 5 drives, a seek delay of 9ms per drive allows 2 drives reading for 3ms with no seek delay impact at all. Data storage drives are generally much faster than workstation drives, so seek times of 4ms and read times of 1.5ms are not unusual.
There are many DBAs who run a traditional physical defragmentation program in tandem with their intelligent drive controller which results in limited improvement. Physically defragmenting a file in an array implicitly leaves the file segmented across the virtual unison of tandem drives. It’s by design. The goal is to gain the most performance while incurring the least overhead – so don’t run physical defrags if they slow the storage by 50% while running, and ultimately improve read speeds 1-2%.
The most important concept to understand is that the controller, physical defragmentation programs, and multi-drive arrays are unaware of what SQL Server is doing with the file data internally. By focusing on SQL Server’s representation of data – how SQL Server has laid out the database itself, how full each page is, and how effectively we’re utilizing available SQL Server resources, we can optimize to the ‘next level’ of SQL Server performance, solidly trumping any benefit to physical defragmentation by orders of magnitude. In a nutshell, SQL Server’s performance can be most improved by focusing on its internals. In fact, once you start focusing on defragmentation at the SQL Server level – whether with manual defragmentation or with the automated defragmentation provided with SQL defrag manager, you may decide that physical defragmentation is no longer needed!
Q. How is SQL Server’s fragmentation affecting my Server?
A. Fragmentation of SQL Server indexes mainly creates wasted space that affects your server performance much more than one might expect.
Fragmentation of your SQL Server’s internal allocations and page structures result in ‘gaps’ or ‘void’ space that is dead weight carried along with valid data. Your backups, storage, I/O channels, buffer memory, cached data, logs, tempdb, CPUs and query plans are impacted by these unnecessary voids. SQL’s fragmentation continually eats away at these resources with nearly every update, delete, insert, and table/index change. If ignored, fragmentation can be the proverbial ‘death by a thousand cuts’ to a server’s performance and scalability.
Figure 3 – How areas are affected by fragmentation (Click to enlarge)
Shown above is a detailed diagram of how SQL Server fragmentation can affect your SQL Server performance and overview of the affected areas. As you identify how the fragmentation affects your server, you’ll see that fragmentation effects are cumulative and nearly impossible to predict. SQL defrag manager, however, uses sophisticated algorithms to predict and detect SQL Server fragmentation “hot spots” and to defragment indexes on a continuous basis.
Q. What creates the voids and other adverse effects and how do I get a handle on them?
A. Typical, day to day activity causes SQL Servers to fragment over time. Changes to your data – inserts, updates, deletes, and even changing varchar values contribute to fragmentation. The full list of actions that cause fragmentation is long and the rate of fragmentation varies across different indexes and tables. Sometimes there is a pattern resulting from seasonal or annual peaks (e.g., when financials run). But more often than not, it is difficult to find, predict and proactively manage fragmentation manually.
Let’s dive into the details of where these voids sit, how they are created, and how they propagate throughout your server:
Figure 4 – SQL Server’s page structure
SQL Server stores all data, objects, and internal structures in 8192 byte data pages shown in Figure 4. These pages are known only to SQL Server and may be stored in one or more physical files on disk. Data gets a maximum of 8096 bytes per page – the rest of the page contains the page header and row locations. When creating a table or index, SQL Server pages fill according to the fill factor you specify (or the closest approximation.)
Over time, insert, deletes, and modifications (such as widening the value in varchar fields) fill the page and ultimately overflow the page creating a ‘page split’. Splitting divides the full page evenly, putting half of its data on a newly allocated page, and may negate any fill factors you designate. For example, if you designate a fill factor of 80%, over time, due to splitting , your pages may reach a fill factor of 50% or less.
The more that heavy, spiked, or continuous changes occur on a table, the faster and further it and its indexes drift. Since the indexes are based on variants of data in the table, they have their own unique drift profile. The net result of drifts is waste – lots of it – waste of your disk, I/O channels, server’s caches and buffers, and CPU utilization. The waste may also skew your query plans.
The void/waste space is known as “internal fragmentation.” Internal fragmentation lowers page density and as a result our server resources trickle slowly away now being increasingly consumed by empty space. SQL does try to fill the voids in split pages – however there is rarely the Tetris™-like fit necessary to reach optimal population post split. The common practice of using an identity column as your clustered index, forces inserts into new pages at the bottom of the table, preventing recovery of the voided space.
Figure 5 – Page splitting due to fragmentation
The space used by actual data is reflected in a metric called “page density.” The denser a page, the more data vs. void it contains. A page density of 100% would mean the data page is completely full. Even if the pages had no void, Figure 3 illustrates how the split has introduced other inefficiency in contiguously accessing the pages after the split. Interestingly this parallels physical fragmentation – although it is a completely isolated variant in SQL Server’s management of data vs. the way files are segmented on disk. This type of fragmentation is called ‘external fragmentation’.
The vast majority of the time, the void grows, instead of filling up neatly. When void space becomes too much (your page density becomes too low), SQL Server will discard the index due to excessive overhead. At this point, fragmentation becomes very evident as very few systems will tolerate discarding indexes in favor of table scans.
For example, shown in Figure 5, the four pages require four logical reads. Defragmentation would condense the data by reorganizing it into two pages and two reads. A 42% reduction in void space is gained by defragmenting the data. This concretely reflects in improved performance and increased scalability. In the example, two times the data could fit in the defragmented space vs. the fragmented space. By reclaiming the voids, we return capacity to our server.
Figure 6 – Comparing fragmented and defragmented size
While it may seem trivial on such a small scale, when your average page density is low, you are wasting disk space, incurring more physical I/O, increased logical reads, wasting precious server memory while computing and comparing data unnecessarily. Further, if you are fortunate enough to have an intelligent I/O controller, you are also wasting the benefit of its optimization strategies. It becomes obvious that this process of splitting, voids, progressive order, and rates of decay requires non-stop attention to insure the server is running with as much free resource as it can.
Q. Ok, so clearly the maintenance benefits are real, and we want to obtain them, now what?
A. There are two approaches most commonly used for fragmentation today and they have disadvantages
If, for a moment, we ignore SQL defrag manager, there are two existing methods for managing SQL Server fragmentation. Neither is ideal, or gives you the information you need to stay informed and on top of the fragmentation challenge. Both leave you completely blind — you won’t know if they helped, hurt, stepped on, or blocked your busiest table.
First method: plain old damage control. The server performance degrades slowly and is ignored. All of a sudden, a spot in the database reaches critical mass, performance craters, and is eventually addressed. This is how the majority of DBAs are introduced to the fragmentation problem. They just fix it and wait for the next hotspot or for SQL Server performance to run down again and again. Unfortunately, you will never know when your server is going to act up or how severe the impact will be. Furthermore, there may be cascade effects caused by inadvertent query plan disruption due to fragmentation.
Second: Run a blind maintenance script. The best all-purpose script I’ve found is Omar Kloeten’s Index Defragmentation Script. These scripts are often quite complex with unpredictable results. They usually work, but may often cause after effects such as blocking or locking and can generate considerable overhead. You have no idea how long it will take the script to find every index – and it must query each one, every time. The script does not track performance benefits, does not track how defragmentation varies each time, and offers no notification of potential problems.
All-purpose SQL Server defragmentation scripts:
Request information that can cause long blocks or deadlocks.
Rarely have internal logic to know when to defragment– instead they just steamroll your servers every day whether they need it or not (perhaps many times a day.)
If you make a minor change to a defragmentation script, you will be required to re-deploy to the new script to all of the servers in your enterprise.
Should be tailored to each database – but to do this would require near constant “hand-tuning.” A very time-consuming and practically impossible process.
Aren’t able to report when the script was run, what performance enhancements were gained, or how many resources they’ve reclaimed on your server since you started running them.
A new third option: Idera SQL defrag manager. SQL defrag manager offers a totally new way to identify, optimize, manage and automate SQL Server defragmentation. It is designed specifically to overcome the compromises DBAs have to make regarding the important task of fragmentation maintenance.
Consider this: If you are able to eliminate void space, every page of void reclaimed is money back in your corporation’s pocket. Those reclaimed resources are regained server capacity that had been lost unnecessarily. SQL defrag manager will reclaim these resources and track the total improvement on every object in your enterprise daily or over a year. You can even produce an annual report showing how much money has been saved through the use of defragmentation technology – and we guarantee that it will be impressive!
SQL defrag manager not only tracks the improvement achieved on each object, it maintains dozens of statistics on each table and index. This information guides SQL defrag manager to determine how often it should check for fragmentation, and if you wish, the method it will use to correct the fragmentation. SQL defrag manager eliminates defragmentation overhead and risk on your servers – there is no agent required on any managed server. There is no job scheduled or script deployed. SQL defrag manager simply runs as a service, quietly in the background with no affect to your production servers.
Unlike scripts, SQL defrag manager’s fragmentation detection routines are non-blocking. Defragmentation is also non-blocking, given the DBA has not chosen to rebuild the fragmented object. Rebuilt objects are often not needed. SQL defrag was invented by two DBAs frustrated with the scripts and the handholding their 99.999% available enterprises required. SQL defrag manager will shed light on the fragmentation levels across your entire SQL Server environment — allowing you to quickly detect and manage fragmentation with ease. It will also give you assurance that defragmentation is being handled in exactly the way it should be for that particular database – no more guessing!
In Conclusion
Fragmentation affects performance through the introduction of wasted page space which erodes the resources of a server. Through proactive use of scripts via maintenance jobs or tools like SQL defrag manager or SQL Sentry’s Fragmentaion Mananger can considerably improve performance where fragmentation exists.
I recommend small shops with downtime windows leverage a script which defragments indexes based on levels of fragmentation. If you are enterprise-class running 24×7 the features available in the tools allow you to gate defragmentation based on resources available, which justifies their cost.
Keeping track of density before and after defragmentation helps illustrate buyback you gain. When translated to dollars one finds a compelling arguement for automated solutions. Below is the report I created for SQL defrag:
Figure 7 – Resources reclaimed by defragmentation
When I begin discussing capabilities of a business intelligence platform with my clients, there are two words I hear from those clients more than any others. The two words I hear most often are “reports” and “data”. It occurred to me sometime back, that if you work for an organization, in a specific industry, and you have a specific job role – then while it sounds overly simplified, business intelligence (or BI as we like to call it) is really about the data (the data you care about) presented in the way you want to consume it (a report).
Anyone who has worked on a business intelligence solution, either from a business analyst perspective or a technology implementation perspective would be quick to tell you BI is much more complex than that. Why? Well it starts with understanding the main capabilities of a Business Intelligence Platform (regardless of vendor or technology), and then drilling into “sub capabilities”. I start at the top with three overarching capabilities of a BI Platform. They are:
Top Level Business Intelligence Platform Capabilities
1) Data Management – All processes and technologies related to gaining insight from existing organizational data.
2) Reporting and Analytics – All formats (or BI Assets) for users to view and/or interact with data to gain business insight.
3) Performance Management – All formats that allow users to monitor key metrics (or goals) that drive business success.
Sounds pretty simple – only three categories. But let’s start with Data Management and the key “sub-capabilities”:
Business Intelligence “Sub Capabilities”
1) Data Management
a) Data Integration – All processes and technologies related to integrating data from disparate systems in the enterprise into a “single version of the truth” (See Data Warehousing).
b) Data Warehousing/Data Marts – Central data repository (or repositories) that provides consistent and accurate data across several business domains (e.g. Accounting, Finance, Operations, Human Resources, Sales, Marketing etc.)
c) Data Quality – All processes and technologies related to the quality control of data coming in from source systems into Data Warehouse/Data Marts. May include exception reporting, fixing data quality issues, auditing, and logging.
d) Master Data Management – All processes, governance, and standards required to ensure consistent Master Data. Master data can be defined as an authoritative source for the products, accounts, and parties for which business transactions are completed.
e) Big Data – This is a new one that has moved onto the scene. This may not be a capability per se, but the logical place to discuss the storage and management of “Big Data” is under the BI Capability of Data Management.
“Everything is a report”. This is often how I begin requirements discussions with a client or project team. But there are subtle nuances that draw a line between Reports, Analytic Views, and Dashboards (as well as help you pick the right tool for the right sub-capability of the BI Platform). Reporting and Analytics (and associated sub-capabilities) are described side by side below.
2) Reporting and Analytics
a) Operational Reporting – An operational report presumes to answer a question that is already known by “the business”. A classic example is a Profit and Loss statement. It answers the question, “What is my revenue, my expense, and my bottom line profit or loss?”. Operational reports tend to be relatively static in nature – meaning a user may select a couple parameters (A department and a date range for example), but the resulting format is generally consistent, and only the underlying data is changing.
b) Self-Service Reporting – Self-Service Reporting describes the empowerment of a user to build their own report, in the format they need and like. Often self-service reporting is described as ad-hoc access to pre-defined data sources (or data models), the ability for a user to “drag and drop” data elements onto a design surface, and format the report to their specific purpose (presentation, data exploration, reference etc.).
2) Reporting and Analytics
a) Analytics – Different from Operational Reporting, analytics describes an activity where the question isn’t fully identified. Often I describe this as “research on why something is happening”, drilling in and out of data to further clarify the question that the user should be asking (e.g. Profits are down –> Why? –> When I drill into the data it looks like one product category isn’t performing –> Why is that? –> When I drill further, it appears a specific product sub-category is under performing in a specific geographical region –> POTENTIAL RESOLUTION: Let’s create an operational report that provides sales detail by region, by product category, and by product sub-category so we can further research on an on-going basis)
b) Self-Service Analytics – Similar to Self Service Reporting, Self Service Analytics empowers the user to build their own analysis. Self Service Analytics goes a step further by empowering users to not only create their own analytical view, but also to model the data to answer specific business questions.
c) Data Exploration – This sub-capability is relatively new to my list, but has been added as a distinction to draw based on some of the new products out there that allow rapid visualization of large data sets (Microsoft SQL Server Reporting Services PowerView is a great example of this). In the activity of data exploration, the question may not even be conceived. The user is simply exploring known data for any key outliers or notable trends. Let’s say we were exploring historical data on patient outcomes at a fertility clinic over a period of 3 years. Think of an animated bubble chart mapping favorable fertility outcomes at this fertility clinic. If there was a period of time where a specific doctor was having great success with a specific in vitro fertilization technique, data exploration would be able to spot it, and allow the clinic to inquire about the best practice that doctor was employing during that time period. Also see the screenshot of data exploration on pollution below.
d) Predictive Analytics – I like to introduce the concept of predictive analytics to clients and project teams by asking the question, “If you had a crystal ball, and could ask any question about the future of your business, what would you ask?”. Predictive analytics (often termed data mining) applies statistical models to known data (historical data, populations of data etc.) to provide insight into “what may happen?”. Determining the odds of who will win the super bowl at the beginning of a season is a form of predictive analytics.
Data Exploration: Below is an example of visually exploring trends and outliers related to the source of air pollution in the United States over a 10 year period (source of data: data.gov)
3) Performance Management
a) Monitoring – The activity of monitoring helps you answer the question of, “What is happening right now in my business?”. A help desk, for example is constantly monitoring the number of outstanding tickets in the queue. The help desk may have a “Target Time To Resolution” metric with a stated goal of resolving priority 1 tickets in under 3 hours. If this goal supports the help desk’s mission of providing accurate response time, the team will want to monitor this metric (maybe in an aggregated scorecard with other metrics), and instantly know if the team is missing or exceeding this mark.
b) Analytics – The same definition above applies here, but it is called out again because if you are monitoring a key metric, and it is not performing as expected, the natural analytical question is, “Why is this metric not performing as expected?”. See Analytics described above under “Reporting and Analytics”.
c) Planning, Budgeting and Forecasting (PBF) – PBF closes the loop of performance management. Planning is the activity of communicating what you would like to happen in the future (e.g. you set a personal financial plan (or goal) for how much money you may need for your kids’ college tuition). Budgeting is the activity of agreeing to (or interlocking) on what needs to happen to support your overall plan. (e.g. so much money needs to be put in a bank account each month to ensure you meet the savings goals of your financial plan for your kids’ college tuition). Forecasting is an ongoing activity of “course correcting” based on actual outcomes that are happening (e.g. you get a bonus in a given month that you add to your bank account that changes your forecast on how much money you need to set aside each month (budget for) to achieve your plan for college tuition). Financial PBF is an easy one to get your head around, but PBF can apply to anything – inventory, scrap, employee or customer churn etc.
Once you gain a collective understanding among the BI project team on the capabilities of an overall BI Platform, you’ll be honed in on which requirements can be supported by which platform capability. Once the requirement and capability is matched, this makes the tool selection and technical aspects of the solution to be delivered more clear. It is also important to note that some vendor tools focus squarely on one single “sub capability”. Caution should be taken to ensure the vendor technologies and overall solution can support existing requirements as well as future requirements that will bleed into other “sub capabilities”.
Many of us these days are utilizing SQL Server Analysis Services (SSAS) as our data source for many of our SQL Server Reporting Services (SSRS) reports. We have a lot of great tools for this like the MDX query builder in SSRS, however, the following combination of requirements presented a bit of a problem for me initially. I was struggling with this a few days ago, and thought I would share how I ultimately fixed the issue. I am sure there are other solutions out there, but this one seemed to be the right fit.
Requirements:
Data Source for SSRS report is an Analysis Services Cube
First Report (or Parent Report) had report parameters built on SSAS Dimension(s)
One of the report parameters needed to be configured as a multi-value (multi-select) list.
Second Report (Child Report) needed to launch in a new browser window
Second Report (Child Report) had a report parameter built on a SSAS Dimension, that was also a multi-value (multi-select) list.
Solution
Requirements 1 and 2 are relatively straightforward. I simply plugged into my SSAS data source, created a data set with the Query builder, and chose the option for the dimension filter to be a parameter. The Parameter highlighted in yellow below needed to be configured as a multi-select in both the parent and child report.
The Parameter highlighted in yellow below needed to be configured as a multi-select in both the parent and child report.
In both the parent and child report, the parameter highlighted in yellow needed be configure to accept multiple values from a multi select list:
Launching a “Drill Through” or “Sub-Report” in a new browser window
Requirement #3 initially had me scratching my head. When configuring an “Action” on a textbox, you have the option to “Go to Report”, and it is relatively straightforward to configure as shown below.
The issue is that this report will render in the same browser window as the parent report that has the textbox action configured. I needed to open the report in a new window, and luckily there is a well-documented solution on how to do this by configuring the action to “Go to URL” and then using a bit of script code to launch a new browser window. You can take a look at this solution here.
So, with requirement #3 out of the way, all that was left was to be able to pass multiple values to an SSRS URL call to render the child report in a new browser window.
Format SSRS URL to accept multiple values
There are a lot of solutions out there that talk about passing multi-values to “drill through” reports when the data source of the target report is SQL Server. We can write in clauses in our stored procedures. I know we could do something similar with customer MDX, but as many of us do, I was searching for a simpler solution.
The first thing to understand is the SSRS URL format for passing multiple values to a parameter built off an SSAS Dimension. You would think the ideal way to pass multiple values would be an array, but keep in mind we are doing this through a URL call. The format for this can be found here.
SHOW URL FORMAT For multiple parameters.
To do this, I used an expression on the action of the textbox (“Go To URL”) to create the URL string. Below are some common functions used in expressions to work with multi-value parameters:
The last issue I ran into was how to enumerate the multiple values to create the correct URL string format (NOTE: the “&ParamName=” needs to be repeated for every value that was selected in the multi-value parameter). The answer again was a small bit of JavaScript code that would accept the parameter name, loop through the values, and create the string format needed as show below (ReportàReportPropertiesàCode tab):
Calling the CreateMultiValueURLString function I created looped through multiple values and created the URL string with a parameter name for every instance of the parameter value.
Summary
Requirements Met:
Data Source for SSRS report is an Analysis Services Cube
Standard Fare for Reporting Services
First Report (or Parent Report) had report parameters built on SSAS Dimension(s)
One of the report parameters needed to be configured as a multi-value (multi-select) list.
Standard Fare for Reporting Services
Second Report (Child Report) needed to launch in a new browser window
Use JavaScript in Textbox Action Expression to call drill through report in a new browser window.
Second Report (Child Report) had a report parameter built on a SSAS Dimension, that was also a multi-value (multi-select) list.
Use JavaScript function to formate the URL string correctly for multi-value parameters.