• Fabric Licensing from Scratch

    The Basics

    If you’ve dealt with Power BI licensing before, Fabric licensing makes sense as an extension of that model plus some new parts around CUs, bursting and smoothing. But what if you are brand new to Fabric, Power BI, and possibly even Office 365?

    If you want to get started with Fabric, you need at a bare minimum the following:

    1. Fabric computing capacity. The cheapest option, F2, costs $263 per month for pausable capacity (called Pay-as-you-go) and $156 per month for reserved capacity. Like Azure, prices vary per region.
    2. An Entra tenant. Formerly called Azure Active Directory, Entra is required for managing users and authentication. You will also need an Office 365 tenant on top of that.
    3. Fabric Free license. Even though you are paying for compute capacity, all users need some sort of license applied to them as well.

    Once you have an F2, you can assign that capacity to Fabric workspaces. Workspaces are basically fancy content folders with some security on top of it. Workspaces are the most common way access is provided to content. With the F2 you’ll have access to all non-Power BI fabric objects.

    The F2 sku provides 0.25 virtual cores for Power BI workloads, 4 virtual cores for Spark workloads, and 1 core for data warehouse workloads. These all correspond to 2 CUs, also known as compute units. CUs are a made up unit like DTUs for databases or Fahrenheit in America. They are, however, the way that you track and manage everything in your capacity and keep costs under control.

    Storage is paid for separately. OneLake storage costs $0.023 per GB per month. You also get X TB of free mirroring storage equal to your SKU level. So F2 gets 2 TB of storage.

    There is no cost for networking, but that will change at some point in the future.

    Power BI content

    If your users want to create Power BI reports in these workspaces, they will need to be assigned a Power BI Pro license at a minimum, which costs $14 per user per month. This applies to both report creators and report consumers. Pro provides a majority of Power BI features.

    The features this does not provide are covered by Power BI Premium per User (PPU) licenses, which cost $24 per user per month. These licenses allow for things like more frequent refreshes and larger data models. PPU is a hybrid license because you both license the user as well as assign the content to a workspace set to PPU capacity.

    One of the downsides of the PPU model is that they act as a universal receiver of content but not a universal donor. Essentially, the only way for anyone to read reports hosted in a PPU workspace is to have a PPU license. So, you can’t use this as a cheat code to license your report creators with PPU and everyone else with Pro. Nice try.

    There is demand for a fabric equivalent, a FPU license, but there is no word on when or if this will happen. Folks estimate this could cost anywhere from $30 to $70 per user per month if we get one.

    Finally, if you ramp up to an F64 sku, Power BI content is then included. Users will still need a Fabric Free license. At $5002/mo for F64, this means it’s worth switching over at 358 Pro users or 209 PPU users. Additionally, you unlock all premium features including copilot.

    Even if you pay for F64 or higher (or Power BI report server on Prem), any report creators need to be licensed with Power BI Pro for use of that publish button. I cannot understand why Microsoft would charge $5k per month and then charge for publishing on top.

    There are also licensing complications for embedding Power BI in a custom application which is outside of the scope of this post. Basically the A and EM SKUs are more restrictive than the F SKUs.

    Capacity management

    Despite a Fabric SKU providing a fixed number of Capacity Units, Fabric is also intended to be somewhat flexible. Fabric customers like the pricing predictability of Fabric compared to Azure workloads, but because of the sheer number of tools and workloads supported, actual usage can vary wildly compared to when premium capacity was only Power BI reports.

    In order to support that, Fabric allows for bursting and smoothing. This is similar to auto-scaling, but not quite. Bursting will provide you with more capacity temporarily during spikey workloads, by up to a factor of 12 in most cases. However, this bursting isn’t free. You are borrowing against future compute capacity. This means it’s possible to throttle yourself by overextending.

    Bursting is balanced out by smoothing. Whenever you have exceeded your default capacity, future work is spread out over a smoothing window. This is a 5 minute window for anything a user might see and 24 hours for background tasks. If you are using pay-as-you-go capacity, you’ll see a spike in CUs when you shut down the capacity, as all of this burst debt is paid off all at once instead of waiting for smoothing to catch up.

    From what I’ve been told by peers, it’s possible that you can effectively take down a capacity with a rogue Spark notebook by bursting for to long. Essentially that smoothing has to use the full window to catch up. At Ignite 2024 they announced they are working on Surge protection to prevent this

    Capacity consumption can be monitored with the Fabric Capacity Metrics App.

    I believe you can also upgrade a reserved capacity temporarily and pay the pay-go costs for the difference, but I can’t find docs to that effect.

  • Benchmarking Power BI import speed for local data sources

    TL;DR – The fastest local format for importing data into Power BI is Parquet and then….MS Access?

    The chart above shows the number of seconds it took to load X million rows of data from a given data source, according to a profiler trace and Phil Seamark’s Refresh visualizer. Parquet is a clear winner by far, with MS Access surprisingly coming in second. Sadly the 2 GB file limit stops Access from becoming the big data format of the future.

    Part of the reason I wanted to do these tests is often people on Reddit will complain that their refresh is slow and their CPU is maxed out. This is almost always a sign that they are importing oodles and oodles of CSV files. I recommended trying Parquet instead of CSV, but it’s nice to have concrete proof that it’s a better file source.

    For clarification, SQL_CCI means I used a clustered columnstore index on the transaction table and “JSON – no types” means all of the data was stored as text strings, even the numbers.

    Finally, if you like this kind of content, let me know! This took about 2 days of configuration, prep, and testing to do. It also involved learning things that the Contoso generated dataset has Nan as a given name, which my python code interpreted as NaN and caused Power BI to throw an error. I’m considering doing something similar for Fabric data sources when Fabric DBs show up in my tenant.

    Methodology

    All of these test were run on my GIGABYTE – G6 KF 16″ 165Hz Gaming Work Laptop (don’t tell my accountants). It has an Intel i7-13620H 2.40 GHz processor, 32 GB of RAM, and a Gigabyte ag450e1024-si secondary SSD. The only time a resource seemed to be maxed out was my RAM for the 100 million row SQL test (but not for columnstore). For SQL Server, I was running SQL Server 2022.

    The data I used was the Contoso generated dataset from the folks at SQLBI.com. This is a great resource if you want to do any sort of performance testing around Star Schema data. I had to manually convert it to JSON, XML, Excel and MS Access. For Excel, I had to use 3 files for the transaction table.

    Initially, I was planning on testing in 10x increments from 10k rows to 100m. However, MS Access imported in under a second for both 10k and 100k, making that a useless benchmark. Trying to convert the data to more than 1m rows of data for XML, JSON, and Excel seemed like more work than it was worth. However, if someone really wants to see those numbers, I can figure it out.

    For recording the times, I did an initial run to warm any caches involved. Then I ran and recorded it 3 times and reported the median time in seconds. For 100m rows, I took so long I just reported the initial run, since I didn’t want to spend half an hour importing data 4 times over.

    Want to try it yourself? Here’s a bunch of the files and some sample at the 10k level:

    Perf Data – local import blog.zip

    What to learn more?

    If you want to learn more about performance tuning Power BI, consider checking out my training course. You can use code ACCESS24 to get it for $20 until Dec 6th.