• Power BI performance tuning – what’s in the course?

    My course on performance tuning is live and you can use code LAUNCHBLOG for 50% off until Sunday February 11th. Module 1 is free on YouTube and Teachable, no signups.

    Performance tuning playlist – Module 1

    The goal of this course is to orient you to the various pieces of Power BI, identify the source of problem, and give some general tips for solving them. If you are stuck and need help now, this should help.

    Note! This is an early launch. Modules 1 and 2 are available now, and the remaining ones will be coming out weekly.

    • Module 1: A Guide to Performance Tuning. This module focuses on defining a performance tuning strategy, and all of the places where Power BI can be slow.
    • Module 2: Improving Refresh – Optimizing Power Query. Optimize Power Query by understanding its data-pulling logic, reducing the data being loaded, and leveraging query folding for faster refreshes.
    • Module 3: Improving Refresh – Measuring Refresh Performance. Master measuring refresh performance using diagnostics and the refresh visualizer to identify which parts are slow.
    • Module 4: Improving Rendering – Modeling. Better modeling means faster rending. Understand the internals of models, using columnar storage, star schema, and tools like DAX Studio for optimization.
    • Module 5: Improving Rendering – DAX Code. Optimize DAX code to run faster, focusing on minimizing formula engine workload and effective data pre-calculation
    • Module 6. Improving Rendering – Visuals. Streamline visuals for better performance by minimizing objects, avoiding complex visuals, and using just-in-time context with report tool-tips and drill-through pages.
    • Module 7. Improving DirectQuery. Optimize DirectQuery with strategies to limit querying, improve SQL performance, and employ advanced features like user defined aggregations, composite models, and hybrid tables.

    Each module after the first covers how to solve performance problems in each specific area. Each module also provides demos of the various tools you can use (of which there are many, see below).

  • Fabric Ridealong week 4 – Who invented this?

    Last week I struggled to load and process the data. I was frustrated and a good bit disoriented. This week has been mostly backing up (again) and getting a better idea of what’s going on.

    Understanding Databricks is core to understanding Fabric

    One of the things that helps to understand Fabric is that it’s heavily influenced by Databricks. It’s built on delta lake, which is created and open sourced by Databricks 2019. You are encouraged to use a medallion architecture, which as far as I can tell, comes from Databricks.

    You will be a lot less frustrated if you realize that much of what’s going on with Fabric is a blend of open source formats and protocols, but also is a combination of the idiosyncrasies of Databricks and then those of Microsoft. David Gomes has good post about data lake file formats, and it’s interesting to imagine the parallel universe where Fabric is built on Iceberg (which is also based on Parquet files) instead of delta lake. (Note, I found this post from this week’s issue of Brent Ozar’s Newsletter)

    It was honestly a bit refreshing to see Marco Russo, DAX expert, a bit befuddled on Twitter and LinkedIn about how wishy-washy medallion architecture is. This was reaffirmed by Simon Whitely’s recent video.

    This also means that the best place to learn about these is Databricks itself. I’ve been skimming through Delta Lake: Up & Running and finding it helpful. It looks like you can also download it for free if you don’t mind a sales call.

    What should I use for ETL?

    After playing around some more, I think the best approach right now is to work with notebooks for all of my data transformation. So far I see a couple of benefits. First, it’s easier to put the code into source control, at least in theory. In practice, a notebook files is actually a big ol’ JSON file, so the commits may look a bit ugly.

    Second, it’s easier from a from a “I’m completely lost” perspective, because it’s easier to step through individual steps, see the results, etc. This is especially true when Delta Lake: Up & Running has exercises in PySpark. I’d prefer to work with dataflows because that’s what I’m comfortable with, but clearly that hasn’t worked for me so far.

    Clip from the book

    Tomaž Kaštrun has a blog series on getting into fabric which shows how easy it is to create a PySpark notebook. I am a bit frustrated that I didn’t realize notebooks were a valid ETL tool, I always thought of them being for data science experiments. Microsoft has some terse documentation that covers some of the options for getting data into your Lakehouse. I hope they continue to expand it like they have done with the Power BI guidance.