My course on performance tuning is live and you can use code LAUNCHBLOG for 50% off until Sunday February 11th. Module 1 is free on YouTube and Teachable, no signups.
The goal of this course is to orient you to the various pieces of Power BI, identify the source of problem, and give some general tips for solving them. If you are stuck and need help now, this should help.
Note! This is an early launch. Modules 1 and 2 are available now, and the remaining ones will be coming out weekly.
Module 1: A Guide to Performance Tuning. This module focuses on defining a performance tuning strategy, and all of the places where Power BI can be slow.
Module 2: Improving Refresh – Optimizing Power Query. Optimize Power Query by understanding its data-pulling logic, reducing the data being loaded, and leveraging query folding for faster refreshes.
Module 3: Improving Refresh – Measuring Refresh Performance. Master measuring refresh performance using diagnostics and the refresh visualizer to identify which parts are slow.
Module 4: Improving Rendering – Modeling. Better modeling means faster rending. Understand the internals of models, using columnar storage, star schema, and tools like DAX Studio for optimization.
Module 5: Improving Rendering – DAX Code. Optimize DAX code to run faster, focusing on minimizing formula engine workload and effective data pre-calculation
Module 6. Improving Rendering – Visuals. Streamline visuals for better performance by minimizing objects, avoiding complex visuals, and using just-in-time context with report tool-tips and drill-through pages.
Each module after the first covers how to solve performance problems in each specific area. Each module also provides demos of the various tools you can use (of which there are many, see below).
I was recently talking to a potential customer about getting started in Power BI. A question they asked me was essentially “What are the difficult parts of Power BI, how do these engagements usually go?”. This was a really interesting question and for me a lot of customer engagements fall under two broad categories.
The lost and confused. A lot of potential customers are just starting to look into Power BI. They cans see what it can do, they understand what it is for, but they have questions. Generally these questions are all about governance and implementation. How do we deploy Power BI? How do we license Power BI? How do we secure it?
This is a reasonable challenge because Power BI has a lot of nooks and crannies. It has a bunch of ways to deploy it and a bunch of ways to license it, and it seems like that add a new one every 6 months. For these customers they generally just seem to be concerned about the unknown unknowns out there. Usually after that first hour, that seems to be all the help they need.
The out of control. These customers have made an investment in Power BI. These customers have started creating reports and seeing result. The problem here is scaling their reports. M and DAX have goofy learning curves. Both start out easy, but get incredibly difficult when you try to do more complex things. Maybe reading CSV files was easy, but now you need to work with a web api now. Maybe your report was fast when it was 60 MB. But now your measure or more complex and you have 250 MB of data and it is slow. These customers need help applying good data modeling and optimizing those models.
At the time of this writing, Power BI Aggregations are still in preview and actively being worked on. Once they leave preview, I expect this issue will either be fixed, or the limitations will be specified in the documentation, just like with DirectQuery in general.
Currently whenever I try to use a what-if parameter or a disconnected parameter table, Power BI Aggregations don’t work as intended, instead it reverts to Direct Query. Which means if I need to use a parameter of some sort, I can’t get the benefit of using aggregations.
UPDATE: This issue seems to depend on where they are being used. Reza Rad identified that the issue does not occur in an if statement.
UPDATE 2: According to Microsoft, this is intended behavior because the parameters aren’t in the pre-aggregations or the mappings. I’ve created a uservoice ticket for this.
Setup
To reproduce this issue, I’ve made an extremely simple data model based AdventureWorks2014 data. There are 4 tables involved with no direct relationships:
SalesHeader, which is my fact table, stored in directquery mode.
SalesHeaderAgg, which is my aggregation table, stored in import mode.
TerritoryParameter, which is a What If Parameter, generated with DAX
Territory, which is a disconnected table, stored in dual mode.
I’ve mapped all the columns from my aggregations table to my detail table. In theory, all DAX queries that don’t require a count on CustomerID or TerritoryID, should hit the aggregation table.
To start with, I have a table summing TotalDue by Customer.
I’ve connected profiler to the SSAS instance that Power BI Desktop runs in the background. This allows us to see what is bring run behind the scenes and if it is hitting the aggregation table.
In this case, Power BI Desktop is doing a TOPN:
EVALUATE
TOPN (
502,
SUMMARIZECOLUMNS (
ROLLUPADDISSUBTOTAL ( ‘SalesHeader'[CustomerID], “IsGrandTotalRowTotal” ),
“SumTotalDue”, CALCULATE ( SUM ( ‘SalesHeader'[TotalDue] ) )
),
[IsGrandTotalRowTotal], 0,
‘SalesHeader'[CustomerID], 1
)
ORDER BY
[IsGrandTotalRowTotal] DESC,
‘SalesHeader'[CustomerID]
And looking at the events, we can see a successful query rewrite, with no DirectQuery events. everything looks good.
The problem
Instead of using an implicit measure, let’s use a explicit measure, with a filter based on a parameter field:
Param Total =
CALCULATE (
SUM ( SalesHeader[TotalDue] ),
FILTER (
SalesHeader,
SalesHeader[TerritoryID] = TerritoryParameter[TerritoryParameter Value]
)
)
And at first, everything looks fine. No DirectQuery calls.
But, if I select one of the parameter values using a slicer, now it switches to using DirectQuery.
So what’s the difference? Well in the second DAX query, it’s applying the filter via TREATAS
What if I use an actual table in dual storage mode and just take the MAX instead?
Param Total =
CALCULATE (
SUM ( SalesHeader[TotalDue] ),
FILTER (
SalesHeader,
SalesHeader[TerritoryID] = MAX ( Territory[TerritoryID] )
)
)
Well, I get the same exact DAX pattern and the same result.
Conclusion
Ultimately, this is one of the tradeoffs of using preview functionality. I’m working with the customer to get a ticket escalated with Microsoft. Ultimately, it may just be an intended limitation of the technology. I hope not, though, because aggregations provide for huge performance improvements with minimal effort.
That being said, if anyone has any ideas, I’m all ears! Below is my proof of concept.
how to keep up with technology. When you are starting out with a technology, it’s just plain hard to get a lay of the land.
So, I thought I’d put together a learning path for Power BI, a technology that changes literally every month. This is a bit of challenge because there are so many moving parts when it comes to Power BI. Accordingly, let’s break down those moving parts into different categories.
So, when I think about Power BI, I like to think about the flow of data. First we have the Data prep piece with Power Query, where we clean up dirty data. Next we model the data with DAX. I’ve written before about the difference between Power Query and DAX. They are like peanut butter and jelly and compliment each other well.
Now, if you are a SQL expert, you may not need to worry about Power Query or DAX much. Maybe you do a lot of the work in SQL. But either way, once your data is modeled, you need to visualize it in some way. You need to learn how to create your reports with Power BI Desktop. Once your report is created, you then need to publish it.
Finally, there is what I would call the IT Ops side of Power BI. You have to install an on-premises data Gateway to access local data. You need to license your users. You need to lock down security. All of these things might be outside of what a normal BI developer has to deal with, but are still important pieces. However, unlike the data flow model we talked about, the ops pieces happens at all of the stages of development and deployment.
With that overview in place, let’s get on to the individual sections and the learning paths as a whole.
Getting started with Power BI
When it comes to getting started with Power BI, I have two recommendations. First get your hands dirty, and secondly buy a book. Power BI is in many ways an amalgamation of disparate technologies. It took me a long time to to understand it and it didn’t really click until I took the edX course and did actual labs.
The reason I say to buy a book is this is a technology that is hard to learn piecemeal. When you are starting out you are much better off having a curated tour of things.
Search Youtube for Dashboard in an Hour. This is a standardized presentation that will show you the basics in under an hour.
Follow the guided learning. This will walk you through bite sized tasks with Power BI.
Take the edX course. It has actual labs where you have to work with data inside of Power BI.
Check out the Introducing Microsoft Power BI book from Microsoft Press. It’s a bit dated at this point, but it’s free and is a great start.
Check out the Power BI: Rookie to Rockstar book from Reza Rad (b|t). The last update was July 2017, but it’s also very comprehensive and good.
Paid resources
Stacia Misner Varga (b|t) has a solid course on Pluralsight. It’s worth a watch.
Consider reading the Applied Power BI by Teo Lachev (b|t). It’s a real deep dive which is great, but can be a lot to take in if you are just getting started. A neat feature is that it’s organized by job role.
Learning Power Query and M
When it comes to self-service data preparation, Power Query is THE tool. The way I describe it is as a macro language for manual data manipulations. If you can pay someone minimum wage to do it in Excel, you can automate it in Power Query. Again, check out this post for the differences between Power Query and DAX.
Free Resources
Start with the guided learning. This quickly covers the basics
Matt Masson has a phenomenal deep dive video on the Power Query formula language, a.k.a M, from a year ago. It really helps elucidate the guiding principals of Power Query and M.
Blogs to check out:
Imke Feldmann (b|t) regularly has complex functions and interesting transformations on her blog.
Ken Puls (b|t) focuses on Excel and along with that, Power Query.
Gil Raviv (b|t) often has neat examples of things you can do with Power BI and Power Query.
Chris Webb (b|t) regularly dives into the innards of Power Query and what you can do with it.
Paid Resources
Ben Howard (b|t) has a Pluralsight course on Power Query. It’s a bit introductory, but great if you are just getting started.
Gil Raviv recently (October 2018) released a book on Power Query. What I really like about this book is it has more of a progression style instead of a cookbook kind of feel.
Ken Puls and Miguel Escobar (b|t) also have a book on Power query that has a cookbook feel. I found it helpful in learning Power Query, but it’s heavily aimed at excel users.
Finally, Chris Webb also has a book on Power Query. He goes into a lot of detail with it. However, the 2014 publish date means it’s starting to get a bit old.
Learning DAX
I always say that DAX is good at two things: aggregating and filtering. You aren’t doing those two things, then DAX is the wrong tool for you. DAX provides a way for you to encapsulate quirky business logic into your data model, so that end users doing have to worry about edge cases and such.
The piece of Power BI that is most prominent are they visuals. While it’s incredibly easy to get started, I find this area to be the most difficult. If you are heavily experience in reporting this shouldn’t be too difficult to learn.
A really interesting book is The Big Book of Dashboards. While it doesn’t mention Power BI, it covers all the ways to highlight data and what really makes a dashboard.
Administering Power BI
Power BI is much more than a reporting tool. It is a reporting infrastructure. This means at some point you may have to learn how to administer it as well.
Free resources
Check out my post on all the different ways you can publish Power BI Reports.
One of the big challenges with Power BI is just keeping up. They release to new features each and every month. Here are a few resources to stay on top of things:
This week we’ve got another episode of SQLChefs with Bert Wagner, where we talk about the different between datasets, reports and dashboards in Power BI.
What are datasets?
A Power BI Dataset is a series of Power Query queries that have been shaped in a DAX model. Each dataset can combine different files, database tables and online services all into one tabular model. In our cookie analogy, these are all different “ingredients”.
Unlike SSRS, a dataset in Power BI does not represent a single table or query of data. A dataset should be considered more like a “flavor” of data used to accomplish a specific type of reporting: financial, operational, HR, etc. So in our analogy, the dataset is the “raw dough”.
So in Power Query, you are going to have a set of queries which each combine a data source with a usually linear set of transformations.
Then, in DAX, you are going to take each of those outputs and combine them into a model. This consists of defining relationships between the outputted tables and adding business logic via calculated columns and measures.
For more on the difference between Power Query and DAX, see our previous episode of SQLChefs.
What are reports?
A power BI report is a series of visualizations, filters and static elements on a canvas. Power BI reports are saved as a single PBIX file and connect to a single dataset. Remember, a Power BI dataset can have many data sources.
Each report can have multiple sheets, just like an Excel workbook. In our analogy, this is us placing our “cookies” on multiple “cookie sheets” making one big batch, all of the same “flavor”.
One report per dataset
A quick aside to something that used to confuse me. In most cases, a report and a dataset are going to have a one to one relationship. A dataset can have one report and a report can have one data set.
Recently this has changed, however. A while back, they added the ability to use an existing dataset as a data source for a report. and at Ignite they announced the ability to share datasets outside of the app workspace they were made in.
That being said, while you are still learning Power BI, it’s easier to remember that in many cases, your dataset and your report are going to have a one-to-one relationship and be tightly linked.
What are dashboards?
In Power BI, dashboards are a way of pulling together visualizations from various reports. When you think dashboard, you are probably thinking something like Microsoft’s definition: “A Power BI dashboard is a single page, often called a canvas, that uses visualizations to tell a story. Because it is limited to one page, a well-designed dashboard contains only the most-important elements of that story.”
However, if you look at the report example above, it probably fits that definition. It is not a Power BI Dashboard. In Power BI, a dashboard is tool for pinning visuals from different reports and other sources of data.
In my opinion, a Power BI Dashboard is as much a tool for organization and navigation, as it is for actual reporting. I think that’s the real value add with Power BI dashboards.
Last week, I had the pleasure of recording some video with Bert Wagner about Power BI. In the video, I got to use one of my favorite analogies for M versus DAX: Are you chopping broccoli or planning a menu?
One of the challenges with learning Power BI, is that you have to learn not 1, but 2 new data manipulations languages. And it’s not always clear what they are good for, especially if you come from the SQL world.
Is M a general purpose knife, or one of those weird egg slicers?
Head Chefs versus Sous Chefs
I have never worked in the restaurant business, but I’m going to make some gross generalizations anyway.
Sous chefs, as far as I can tell, do a lot of the prep work. They are cutting vegetables, cleaning food, making sauces, etc. While this is all important work, much of it doesn’t inform the final outcome. If you are making beef teriyaki or if you are making broccoli salad, you still need to chop the broccoli.
The head chef however, gets paid for her brains just as much as her hands. The head chef is figuring out the menu and how to combine all of the ingredients. She is involved very heavily with what the final result is going to be. A head chef has to think of the broader goals and strategy of the restaurant, not just how to get the immediate task done.
M is the Sous Chef; DAX is the Head Chef
Again this is all a gross generalization, but in the restaurant called Casa De Meidinger this is actually the case! I do a lot of the grunt work when we cook a meal. My wife says, “zest this lemon” and I mindlessly do it. I could probably be replaced with a robot some day, and that would be fine by me.
Annie, however, actually enjoys planning a meal, deciding what to cook, and thinking about how to make the final product. To me, cooking is just a necessary evil for eating. I don’t necessarily get any joy from the process itself.
Working with M
I like to think of M as this sous chef. It does all the grunt work that we’l like to automate. Let’s say that my boss asks for a utilization report for all of the technicians. What steps am I doing to do in M?
Extract the data from the line of business system
Remove extraneous columns
Rename columns
Enrich the services table with a Billable / NonBillable column
Generate a date table
This is all important work, but I would have to do the same work for a variety of reports. Many of the steps tell me nothing about the final product. I would generate a date table for most of my reports, for example.
Working with DAX
Now, if I’m working DAX, what am I going to do?
Ask what the heck “utilization” really means
This was a real-life example that happened to me. What is utilization as a key metric? Well it turns out it depends what you are trying to report on. A simple definition is usage divided by availability. If a technician billed 20 hours and clocked in 40, his utilization would be 50%. Or so you would think.
How do we handle internal projects? Let’s say we have a technician who billed 2 hours to a customer, but spent 38 hours on an internal database migrations. What was his utilization?Well, if we are looking for billable utilization, it’s 5%. If we are looking for total utilization, it is 100%. These are questions that you are going to encapsulate in your DAX formulas.
The whole idea of a BI semantic layer is to hide away the meaning from the end users. When someone orders a cobb salad, they don’t want to have to articulate the ingredient list. They just want a darn salad.
Are you paid for your hands or your brain?
In the SQL Data Partners podcast, episode 114, there was a question: what’s the difference between a contractor and a consultant. One of the answers was this: a contractor is a set of hands, and a consultant is a set of brains.
I think this answer relates to M versus DAX. M is an automated set of hands, able to do work you’d normally do by hand in Excel. DAX let’s you take your domain knowledge and encode it into a data model. It’s an externalized representation for your brain.
And if you think about it, which do you want to be paid for? Do you want to get paid to unpivot data by hand every week? Or do you want to get paid for thinking, for understanding the business and for working at a higher level.
M allows you to automate the first step, so you can do more of the latter with DAX.
I just gave a presentation for the Excel BI virtual group on database theory, and I’m really happy with how it went. I think it’s an undeserved topic quite honestly. So many people in the excel world learn everything ad-hoc and never have a chance to learn some of the fundamentals.
A number of questions came up relating to the engine and how the performance works. If you are interested in more detail on that, I suggest checking out my talk on DAX.
I’m really excited to give this presentation. I think it turned out well and covers a lot of deep content for an introductory presentation. Hopefully at some point I’ll turn it into a series of posts.
Overall I think the first meeting went fairly well! We had a delay because the the president was in town, which I certainly wasn’t expecting. Still, we were able to get everything covered in a timely fashion.
I’m excited for when we put out the survey because I have no idea what kind of content people are looking for. I expect I’ll be doing a number of the presentations going forward.