Month: June 2017

  • Wrangling GotoWebinar Stats with Power Query: Part one

    |

    So, this week I gave my first presentation to

    Ugh.

    Power Query to the rescue

    Normally this would be a giant pain to work with. When it comes to data quality, this is quite the

    Excel is going to make some assumptions about what is part of the table. This is convenient for our needs, but we’ll have to find a work around when we want to scale to multiple excel files.

    image
    image

    We can’t tell it we have headers, because it’s going to think that first row is a header. We’ll deal with that later. Once we click OK, we are taken to the Power Query / Power Pivot window.

    I made a mistake

    Hmm, so it looks like I made a mistake. I hope my honesty won’t lose me any

    Trying again

    Let’s take a different approach. I’m going to open a blank excel workbook and pull the data into there. Okay, so let’s go to manage under the Power Pivot tab.

    image

    Next, we are going to click “Get External Data From Other Sources”

    image

    Then I’m going to scroll to the bottom and select Excel File.

    Once selected, I only have the whole first sheet as an option. If I had table objects or named ranges, that would be different.

    image

    Hmmm, I still can’t find a way to edit the Power Query. Fiddlesticks!

    Normally, in Power BI it would be right here:

    image

    Trying to do this in Excel is quite the

    Okay, let’s try opening that Excel file. Ah, much better. Now I want to click Edit at the bottom right.

    image

    Cleaning the Data

    So, First thing we need to do is get rid of all of the non-header rows at the top.

    image

    To do that, I just select Remove Rows –> Remove Top Rows.

    image

    Then I specify I want to get rid of the top 7 rows.

    image

    Next, I want to turn the actual header row into a header.

    image

    Okay, so now it looks like a real table.

    image

    Comma Delimited BS

    Okay, so now we need to parse out the times someone was watching. The problem is that some people were in and out. Their entries are comma delimited. Ugh.

    image

    Okay, let’s split them up. I’m going to select Split Column –> By Delimiter

    Unfortunately, splitting by column a) splits into more columns and b) you have to specify how many.

    image

    Thankfully, we can select those new columns and unpivot them.

    image

    Perfect. Now we have a row for every time a person as watching.

    image

    String parsing

    Okay, so now we just need to parse out the dates. First, we are going to split on the dash, and then the parenthesis.

    image

    This is starting to look good.

    Now we just need to get rid of the timezone and convert it to a datetime. First we need to select Replace Values.

    image
    image

    Lastly, we select the data type we want.

    image

    What’s next?

    Now that are data is cleaned up, we’ll join to sessions table and do some simple data modeling. But that’s for the next blog post.