SQLGene Training

The more I tried to research practical ways to make use of ChatGPT and Power BI, the more pissed I became. Like bitcoin and NFTs before it, this is a world inextricably filled with liars, frauds, and scam artists. Honestly many of those people just frantically erased blockchain from their business cards and scribbled on “AI”.

There are many valid and practical uses of AI, I use it daily. But there are just as many people who want to take advantage of you. It is essential to educate yourself on how LLMs work and what their limitations are.

Other than Kurt Buhler and Chris Webb, I have yet to find anyone else publicly and critically discussing the limitations, consequences, and ethics of applying this new technology to my favorite reporting tool. Aside from some video courses on LinkedIn Learning, nearly every resource I find seems to either have a financial incentive to downplay the issues and limitations of AI or seems to be recklessly trying to ride the AI hype wave for clout.

Everyone involved here is hugely biased, including myself. So, let’s talk about it.

Legal disclaimer

Everything below is my own personal opinion based on disclosed facts. I do not have, nor am I implying having, any secret knowledge about any parties involved. This is not intended as defamation of any individuals or corporations. This is not intended as an attack or a dogpile on any individuals or corporations and to that effect, in all of my examples I have avoided directly naming or linking to the examples.

Please be kind to others. This is about a broader issue, not about any one individual. Please do not call out, harass, or try to cancel any individuals referenced in this blog post. My goal here is not to “cancel” anyone but to encourage better behavior through discussion. Thank you.

LLMs are fruit of the poisoned tree

Copyright law is a societal construct, but I am a fan of it because it allows me to make a living. I’m not a fan of it extending 70 years after the author’s death. I’m not a fan of companies suing against archival organizations. But If copyright law did not exist I would not have a job as a course creator. I would not be able to make the living I do.

While I get annoyed when people pirate my content, on some level I get it. I was a poor college student once. I’ve heard the arguments of “well they wouldn’t have bought it anyway”. I’ll be annoyed about the $2 I missed out on, but I’ll be okay. Now, if you spin up a BitTorrent tracker and encourage others to pirate, I’m going to be furious because you are now directly attacking my livelihood. Now it is personal.

Whatever your opinions are on the validity of copyright law and whether LLMs count as Fair Use or Transformative Use, one thing is clear. LLMs can only exist thanks to massive and blatant copyright infringement. LLMs are fruit of the poisoned tree. And no matter how sweet that fruit, we need to acknowledge this.

Anything that is publicly available online is treated as fair game, regardless of whether or not the author of the material has given or even implied permission, including 7,000 Indie books that were priced at $0. Many lawsuits allege that non-public, copywritten material is being used, given AI’s ability to reproduce snippets of text verbatim. In an interview with the Wall Street Journal, Open AI’s CTO dodged the question on whether SORA was trained on YouTube videos.

Moving forward, I will be pay-walling more and more of my content as the only way to opt-out of this. As a consequence, this means less free training material for you, dear reader. There are negative, personal consequences for you.

Again, whatever your stance on this is (and there is room for disagreement on the legalities, ethics, and societal benefits), it’s shocking and disgusting that this is somehow all okay, but in the early 2,000s the RIAA and MPAA sued thousands of individuals for file-sharing and copyright infringement, including a 12 year old girl. As a society, there is a real incoherence around copyright infringement that seems to be motivated primarily by profit and power.

The horse has left the barn

No matter how mad or frustrated I may get, the horse has permanantly left the barn. No amount of me stomping my feet will change that. No amount of national regulation will change that. You can run a GPT-4 level LLM on a personal machine today. Chinese organizations are catching up in the LLM race. And I doubt any Chinese organization intends on listening to US or EU regulations on the matter.

Additionally, LLMs are massively popular. One survey in May 2024 (n=4010) of participants in the education system found that 50% of students and educators were using ChatGPT weekly.

Another survey from the Wharton Business School of 800 business leaders found that weekly usage of AI had from up from 37% in 2023 to 73% in 2024.

Yet another study found that 24% of US workers aged 18-64 use AI on a weekly basis.

If you think that AI is a problem for society, then I regret to inform you that we irrevocably screwed. The individual benefits and corporate benefits are just too strong and enticing to roll back the clock on this one. Although I do hope for some sort of regulation in this space.

So now what?

While we can vote for and hope for regulation around this, no amount of regulation can completely stop it, in the same way that copyright law has utterly failed to stop pirating and copyright infringement.

Instead, I think the best we can do it to try to hold ourselves and others to a higher ethical standard, no matter how convenient it may be to do otherwise. Below are my opinions on the ethical obligations we have around AI. Many will disagree, and that’s OK! I don’t expect to persuade many of you, in the same way that I’ll never persuade many of my friends to not pirate video games that are still easily available for sale.

Obligations for individuals

As an individual, I encourage you to educate yourself on how LLMs work and their limitations. LLMs are a dangerous tool and you have an obligation to use them wisely.

Here are some of my favorite free resources:

Intro to Large Language Models – Andrej Karpathy
Large Language Models explained Briefly – Grant Sanderson
Transformers (how LLMs work) explained visually – Grant Sanderson
Let’s Build GPT: from scratch, in code, spelled out – Andrej Karpathy
What is ChatGPT Doing and Why Does it Work? – Stephen Wolfram
One Useful Thing – Ethan Mollick

Additionally, Co-Intelligence Living and Working with AI by Ethan Mollick is a splendid, splendid book on the practical use and ethics of LLMs and can be gotten cheaply at Audible.

If you are using ChatGPT for work, you have an obligation to understand when and how it can train on your chat data (which is does by default). You have an ethical obligation to follow your company’s security and AI policies to avoid accidentally exfiltrating confidential information.

I also strongly encourage you to ask ChatGPT questions in your core area of expertise. This is the best way to understand the jagged frontier of AI capabilities.

Obligations for content creators

If you are a content creator, you have an ethical obligation to not use ChatGPT as a ghostwriter. I think using it for a first pass can be okay and using it for brainstorming or editing is perfectly reasonable. Hold yourself to the same standards to as if you were using a human.

For example, if you are writing a conference abstract and you use ChatGPT, that’s fine. I have a friend who I help edit and refine his abstracts. Although, be aware that if you don’t edit the output, the organizers can tell because it’s going to be mediocre.

But if you paid someone to write an entire technical article and then slapped your name on it, that would be unethical and dishonest. If I found out you were doing that, I would stop reading your blog posts and in private I would encourage others to do the same.

You have an ethical obligation to take responsibility for the content you create and publish. To not do so is functionally littering at best, and actively harmful and malicious at worst. To publish an article about using Power BI for DAX without testing it first is harmful and insulting. Below is an article on LinkedIn with faulty DAX code that subverted the point of the article. Anyone who tried to use the code would have potentially wasted hours troubleshooting.

Don’t put bad code online. Don’t put untested code online. Just don’t.

One company in the Power BI space has decided to AI generate articles en masse, with (as far as I can tell), no human review for quality. The one on churn rate analysis is #2 on the search results for Bing.

When you open the page, it’s a bunch of AI generated slop including the ugliest imitation of the Azure Portal I have ever seen. This kind of content is a waste of time and actively harmful.

I will give them credit for at least including a clear disclaimer, so I don’t waste my time. Many people don’t do even that little. Unfortunately, this only shows up when you scroll to the bottom. This means this article wasted 5-10 minutes of my time when I was trying to answer a question on Reddit.

Even more insultingly, they ask for feedback if something is incorrect. So, you are telling me you have decided to mass litter content on the internet, wasting people’s time with inaccurate posts and you want me to do free labor to clean up your mess and benefit your company’s bottom line? No. Just no.

Now you may argue “Well, Google and Bing do it with their AI generated snippets. Hundreds of companies are doing it.”. This is the most insulting and condescending excuse I have ever heard. If you are telling me that your ethical bar is set by what trillion dollar corporations are doing, well then perhaps you shouldn’t have customers.

Next, If you endorse an AI product in any capacity, you have an ethical obligation to announce any financial relationship or compensation you receive from that product. I suspect it’s rare for people in our space to properly disclose these financial relationships, and I can understand why. I’ve been on the fence on how much to disclose in my business dealings. However, I think it’s important and I make an effort to do it for any company that I’ve done paid work with, as that introduces a bias into my endorsement.

These tools can produce bad or even harmful code. These tools are extremely good at appearing to be more capable than they actually are. It is easily to violate the data security boundary with these tools and allow them to train their models on confidential data.

For goodness sake, perhaps hold yourself to a higher ethical standard than an influencer on TikTok.

Obligations for companies

Software companies that combine Power BI and AI have an obligation to have crystal clear documentation on how they handle both user privacy and data security. I’m talking architecture diagrams and precise detail about what if any user data touches your servers. A small paragraph is woefully inadequate and encourages bad security practices. Additionally, this privacy and security information should be easily discoverable.

I was able to find three companies selling AI visuals for Power BI. Below is the entirely of the security statements I could find, outside of legalese buried in their terms of service or privacy documents.

While the security details are hinted at in the excerpts below, I’m not a fan of “just trust us, bro”. Any product that is exfiltrating your data beyond the security perimeter needs to be abundantly clear on the exact software architecture and processes used. This includes when and how much data is sent over the wire. Personally, I find the lack of this information to be disappointing.

Product #1

“[Product name] provides a secure connection between LLMs and your data, granting you the freedom to select your desired configuration.”

“Why trust us?

Your data remains your own. We’re committed to upholding the highest standards of data security and privacy, ensuring you maintain full control over your data at all times. With [product name], you can trust that your data is safe and secure.”

“Secure

At [Product name], we value your data privacy. We neither store, log, sell, nor monitor your data.

You Are In Control

We leverage OpenAI’s API in alignment with their recommended security measures. As stated on March 1, 2023, “OpenAI will not use data submitted by customers via our API to train or improve our models.”

Data Logging

[Product name] holds your privacy in the highest regard. We neither log nor store any information. Post each AI Lens session, all memory resides locally within Power BI.”

Product #2

Editors Note: this sentence on appsource was the only mention of security I could find. I found nothing on the product page.

“This functionality is especially valuable when you aim to offer your business users a secure and cost-effective way of interacting with LLMs such as ChatGPT, eliminating the requirement for additional frontend hosting.”

Product #3

“ Security

The data is processed locally in the Power BI report. By default, messages are not stored. We use the OpenAI model API which follows a policy of not training their model with the data it processes.”

“Is it secure? Are all my data sent to OpenAI or Anthropic?

The security and privacy of your data are our top priorities. By default, none of your messages are stored. Your data is processed locally within your Power BI report, ensuring a high level of confidentiality. Interacting with the OpenAI or Anthropic model is designed to be aware only of the schema of your data and the outcomes of queries, enabling it to craft responses to your questions without compromising your information. It’s important to note that the OpenAI and Anthropic API strictly follows a policy of not training its model with any processed data. In essence, both on our end and with the OpenAI or Anthropic API, your data is safeguarded, providing you with a secure and trustworthy experience.”

Clarity about the model being used

Software companies have an obligation to clearly disclose which AI model they are using. There is a huge, huge difference in quality between GPT 3.5, GPT 4o mini, and GPT 4o. Enough so that to not be clear on this is defrauding your customers. Thankfully, some software companies are good about doing this, but not all.

Mention of limitations

Ideally, any company selling you on using AI will at least have some sort of reasonable disclaimer about the limitations of AI and for Power BI, which things AI is not the best at. However, I understand that sales is sales and that I’m not going to win this argument. Still, this frustrates me.

Final thoughts

Thank you all for bearing with me. This was something I really needed to get off my chest.

I don’t plan on stopping using LLMs anytime soon. I use ChatGPT daily in my work and I recently signed up for GitHub Copilot and plan to experiment with that. If I can ever afford access to an F64 SKU, I plan to experiment with Copilot for Fabric and Power BI as well.

If you are concerned about data security, I recommend looking into tools like LM studio and Ollama to safely and securely experiment with local LLMs.

I think if used wisely and cautiously, these can be an amazing tool. We all have an obligation to educate ourselves on the best use of them and their failings. Content creators have an obligation to disclose financial incentives, when they use ChatGPT heavily to create content, and general LLM limitations. Software companies have an obligation to be crystal clear about security and privacy, as well as which models they use.

On some level, I’ve started to hate writing these blog posts.

The original intent was to show the ups and downs of being a consultant, inspired by Brent Ozar’s series on the same thing. There’s a huge survivorship bias in our field, only the winners talk about self-employment, and the LinkedIn algorithm encourages only Shiny Happy People. But when you enter the third consecutive year of the 3 most difficult years of your career, you start to wonder if it might be a you problem. So here we go.

Pivoting my business

Two years ago, Pluralsight gave all authors a 25% pay cut and I knew I needed to get out. I reached out to everyone I knew who sold courses themselves for advice. I’m deeply grateful to Matthew Roche, Melissa Coates, Brent Ozar, Kendra Little, and Erik Darling for the conversations that calmed my freak out at the time.

One year ago, I learned that I can’t successfully make content my full-time job while also successfully consulting. Consulting work tends to be a lot of hurry-up-and-wait. Lots of fires, emergencies, and urgencies. No customer is going to be happy if you tell them the project needs to wait a month because you have a course you need to get out. Previously with Pluralsight I was able to make it work because they scoped the work, so it was more like a project. Not so when hungry algos demand weekly content.

So, I cut the consulting work to a bare minimum. Thankfully, I receive money enough from Pluralsight royalties that even with the cut we never have to worry about paying the mortgage. However, it’s nowhere close to covering topline expenses. At the beginning pandemic, $6k/mo gross revenue was what we needed to live comfortably (Western PA is dirt cheap). After the pandemic, I hired a part time employee, inflation happened, and I pay for a lot more subscriptions, like Teachable and StreamlineHQ, so that number is closer to $9k/mo now.

I can confirm that I have not and never will make $9k/mo or more from just Pluralsight. My royalties overall have been stagnant or even gone down a bit since the huge spike upwards in early 2020. So it’s not enough to live off of alone.

Finally, after a lot of dithering in the 2023, I decided to set a public and hard deadline for my course. We were launching in February 2024 hell or high water. I launched with 2 out of 7 modules and it was a huge success, making low four figures. I’m grateful to everyone who let me on to their podcast or livestream, which provided a noticeable boost in sales.

Unfortunately, I had a number of projects right after launch, taking a lot of my focus. I also found out that this content was much much more difficult than the Pluralsight content I was used to. There was no one from curriculum to hand me a set of course objectives to build to. No one to define the scope and duration of the course.

What’s worse, the reason there is a moat and demand for Power BI performance tuning content is almost no one talks about it. You have dozens of scattered Chris Webb blog posts, a book and a course from SQL BI, a course by Nikola Illic, and a book by Thomas LeBlanc and Bhavik Merchant. And that’s about it?

I thought I was going to be putting out a module per week, when in reality I was doing Google searches for “Power BI Performance tuning”, opening 100 tabs, and realizing I had signed myself up for making 500 level internals content. F*ck.

A summer of sadness

All at the same time I was dealing with burnout. My health hadn’t really improved any over the past 3 years and I was finding it hard to work at all. I was anxious. I couldn’t focus. And the content work required deep thought and space and I couldn’t find any. I felt a sense of fragility where I might have a good week the one week and then have a bad nights sleep and derail the next week.

I hadn’t made any progress on my course and a handful of people reached out. I apologized profusely, offered refunds, and promised to give them free access to the next course. If you were impacted by my delays, do please reach out.

In general, I decided that I needed to keep cutting things. I tried to get any volunteer or work obligations off my plate. The one exception is I took on bringing back SQL Saturday Pittsburgh. With the help of co-organizers like James Donahoe and Steph Bruno, it was a lot of work but a big success. I’m very proud of that accomplishment.

Finally turning a corner

I think I finally started turning a corner around PASS Summit. It was refreshing to see my friends and see where the product is going. Before Summit, I had about 3.5 modules done. In the period of a few weeks I rushed to get the rest done. This was also because I really wanted to get the course finished for a Black Friday sale.

The sale went well, making mid three figures. Not enough to live on, but proof that there is demand and it’s worth continuing instead of burning it all down and getting a salaried job. Still, I recently had to float expenses on a credit card for the first time in years, so money is tighter than it used to be. Oh the joys of being paid NET 30 or more.

Immediately after Black Friday, I went to Philadephia to delivery a week long workshop on Fabric and Power BI. The longest training I had ever given before was 2 days. The workshop went well, but every evening I was rushing back to my hotel room to make more content. You would think that 70 slides plus exercises would last a whole day, but no, not even close.

Now I’m back home and effectively on vacation for the rest of the year and it’s lovely. I’m actually excited to be working on whatever whim hits me, setting up a homelab and doing Fabric benchmarks. It’s the first time I’ve done work for fun in years.

I’m excited for 2025 but cautious to not over-extend myself.

The fraught ethics around AI, ChatGPT, and Power BI.