Category Archives: Diversity & Inclusivity

Why I appreciate it when folks share their pronouns

I think if you aren’t already on-board with it, the whole pronouns thing can seem weird. I remember when people started to adding pronouns to their Twitter profiles and started asking everyone to do the same, and I just didn’t get it. Never in the history of ever has anyone confused me for a woman. I am 6’2″, broad-shouldered, and have an over-abundance of facial hair. It made no sense to me why I should add my pronouns to my Twitter Bio and then later on, my PowerPoint slides. There simply wasn’t a need.

And then, a couple of years later, I found out the person I was married to was a transgender man. We both did really. And suddenly, a subject that I would rather have just muted on Twitter and ignored was now a quintessential part of my life. My hope with the rest of the blog post is that I can explain why I appreciate when folks share their pronouns, and potentially encourage you to do the same.

Small courtesies are how we show people they are important

In my early twenties, I used to be very bad at people skills. I was oblivious, didn’t like small talk, and didn’t understand a lot of social norms. One of the books that really helped me is called “How to have Confidence and Power in Dealing with People” by Les Giblin. It’s a weird title that sounds like a 1950s sales pitch, but so much of the book is about being considerate to other people. One part that sticks with me today is about the importance of small courtesies.

In the book, Les says, “All of us not only need to feel important — We need to feel that other people recognize and acknowledge our importance.” The way that we do this is through small courtesies, small acts of extra effort. When we show up 5 minutes early to a meeting, we show people they are of value. When we make the effort to use someone’s preferred name, we show that they are important. In my mind, if you share your pronouns and don’t need to, that is a small courtesy, and I appreciate it.

Why is it a courtesy?

I remember a friend of mine asking “Why would I add my pronouns to my presentations? That’s a personal part of my identity.” And that’s true, it felt weird for me the first time I did it, and I still feel awkward when I say it out loud. As I said, no one has ever mistaken me for a woman, it’s never been in question. So why do it?

Well, in some ways that’s the point. It is a shared discomfort, it is a shared vulnerability. There is always a risk that by sharing that information you open yourself to mockery or cruelness. I regularly see in Twitter people suggesting that pronouns in your bio means you are partisan and unreasonable. I certainly hope that doesn’t describe me!

For some people, like my husband, sharing his pronouns isn’t as optional as it is for me. For him, to be referred to as “she” or by his old name, it’s a source of unease or discomfort. Just like how if your name is Matthew, you might not like it if people call you Matt. But it becomes a no-win situation for people like my husband. Does he ignore it and suffer recurring discomfort or does he share his information and risk verbal abuse or worse?

I worry about his safety regularly. I still quietly flinch when I tell people strangers that I have a husband. Thankfully no one has ever been a jerk to either one of us about it, but I still worry. Just like in my blog post about Codes of Conduct, when I see that people have worked to make our situation feel normal, I feel safer and more at ease.

It allows me to show you a small courtesy

Whenever I put together my newsletter, I will copy someone’s name directly from LinkedIn or Twitter. It’s very important to me that I get people name right. I feel the same way about people’s pronouns. I try not to just assume any more, given the situation in my own marriage. And I absolutely hate guessing, if I can easily avoid it.

I understand that there are situations where it doesn’t make sense for folks, such as cultures where pronouns are non-gendered or folks that don’t feel safe being out as trans. But when it does make sense, please help me demonstrate you are important and worthy of value, by getting your details correct.

Why a Code of Conduct is Valuable

I believe having a Code of Conduct is valuable and that every conference should have one. I wrote before about my thoughts on diversity in conferences, and I think a good code of conduct is a way to potentially have a more diverse speaker and attendee pool. It is a way to make more marginalized folks feel safer and, ideally, make a better experience for everyone.

Shouldn’t people know better?

I’ve seen the argument that a Code of Conduct ignores that fact most people are decent. That the majority of folks who do cause harm, they don’t intend it. The concern is that by writing down a list of banned behavior, we at best state the obvious and at worst imply that the average person is indecent.

In short, shouldn’t it go without saying, “Don’t be a jerk?”.

I’m willing to concede that most people are decent and mean no harm. I find that following Hanlon’s Razor makes me a less miserable and more open-minded person. In an ideal world, there would never be a need for a Code of Conduct. Everyone would be on the same page about what is appropriate behavior, and how to report inappropriate behavior. People would perfectly tune their jokes and comments to the sensitivities of their audience.

When in your life has that been the case? I was not blessed with common sense when I was born. I have spent so many years putting my foot in my mouth. In particular, I remember the time a coworker told me she was in sales and I said “Oh, so you work in recruiting.” because up to that point I had figured sales a was male-dominated, pushy profession and recruiting was more soft-skills oriented. Ooof.

Intention and impact are separate

When I made that recruiting comment, not an ounce of harm or malice was meant. But what these arguments seem to miss sometimes is that we have to look at intention AND impact. My words still could have been discouraging or frustrating for the receiver. What really solidified this for me was an example from the book Crucial Conversations:

Imagine someone is drunk and decides to drive home. During the drive they blow through a stoplight and hit someone. Did they intend to cause harm? No. Did they cause harm? Yes. No one ever drives drunk thinking “I hope I hurt someone tonight”. If we only looked at intentions we’d never be able to stop that sort of behavior.

The truth of the matter is we are all drunk drivers when it comes to social interactions, to some degree. Some of it is obliviousness, misspeaking, or a lack of natural grace. And for those folks, there should be room for gentle correction and growth. But others choose not to grow or change their behaviors.

What a Code of Conduct does is help separate the two groups more quickly. If you read the code, agreed to it, and still violated it, it is much more likely that you simply don’t care. It still may have been an honest mistake, and there is room there for that person apologizing and improving. But it is infinitely easier to identify repeat offenders when the rules are clear.

Isn’t this virtue signaling?

Another argument I’ve seen is the concern about virtue signaling. I.E. are folks creating these things not out of genuine consideration, but for the applause of their peers. That the folks pushing for these things don’t care about the results.

To me this is such a strange criticism, because it’s a criticism of intention. This is often from the same folks who feel that we should be giving the benefit of the doubt to people like me who accidentally say unkind or offensive things. Why is the benefit of the doubt not given to organizers as well? I don’t like accusing folks of virtue signaling because that implies I can accurately assess other folks’ goals and motivations. I simply can’t and it’s likely you can’t either.

Again, we should try to separate intention from impact. A poorly written code of conduct, without proper enforcement mechanisms is harmful. Totally agreed, but I personally think on average they are beneficial.

Potential benefits of a code

So what are the some potential benefits of having a code of conduct?

First, it has the debate happen before the incident. When my husband and I got married, we started a budget for our finances. This was important, because if we are going to argue about how much money was appropriate to spend on restaurants, I’d rather have the argument before the money was spent. The same is true of a community hashing out what behavior is appropriate. You don’t want to have to figure out those lines right after an incident occurs.

Second, it empowers bystanders. In The Checklist Manifesto, they talk about how checklists can help deal with a power differential. Specifically, it gave nurses more space to call out surgeons when they skipped critical safety steps before a surgery. Many of us are conflict averse, and having something clear cut to point to can help us avoid getting caught in the bystander effect.

Third, it signals care and safety. When my husband came out as trans, I started having pervasive concerns about our safety. I’m 6’2″ and 280 lbs, my personal concerns about physical violence are almost nil. But I still often flinch when I tell people I have a husband, anticipating the possibility of someone being a jerk to one of us. Thankfully this has yet to be an issue for us.

When I see that an event organizer has put in the time to call out bad behavior ahead of time, it helps me let my hair down, so to speak. It shows that they went out of their way to think about these things and that I’d likely be successful reporting any issues. I don’t have to constantly keep my guard up.

Summary

I think it’s reasonable to have concerns about folks trying to enumerate bad behaviors and potentially doing so unskillfully or thoughtlessly. But the truth of matter is you already have a default code of conduct at events, it’s just unwritten and often contradictory. It’s whatever folks can get away with. I know for me personally, I feel safer at an event when they have been clear about what behavior is unacceptable.

My thoughts on diversity in tech conferences

Today there has been some discussion on Twitter about diversity in tech conferences. I’m not going to link to the discussion directly, because this isn’t about the specific conference that spurred the conversation. I’m not here to name and shame anyone.

I volunteered for 3 years as program manager for PASS Summit, so I will be speaking from experience. I have written before that diversity is important, and I think the bar for that is raising for tech conferences. So where should that bar be?

What is your target goal?

First, I think every single tech conference needs some kind of target goal for diversity. It doesn’t have to be a a hard numerical goal and it doesn’t have to be a quota, but diversity should be somewhere in the thought process, every step along the way. If you are not intentional about this, you will trend towards the default, which is a very homogenous speaker pool.

So what kinds of goals are there? In my mind I see 4 easily defined ones:

  1. No goal. No consideration given to diversity. This is unacceptable in 2021.
  2. Attendee demographics. Aiming for parity with the diversity of your audience
  3. Speaker demographics. Aiming for parity with your speaker pool
  4. Regional/Global demographics. Aiming for parity with the general population.

I personally believe the bar today should be set at #2 & #3, whichever is more difficult. In my experience, there are concrete steps you can do to improve the diversity of your speaker pool, such as encouraging folks to submit, or setting aside a certain number of invitation slots. If the your selected speakers aren’t at least as representative as your submitted speakers, then you aren’t trying hard enough.

When I was working on PASS Summit, diversity was a secondary concern, but it was a consistent one, and we took steps when there were issues. We didn’t have any hard targets, but if we felt there was an imbalance, we would go out of our way to juggle the schedule or intentionally invite speakers. There was plenty of room for improvement, however, especially with decisions outside of the selection process.

Where does diversity matter most?

It is not enough to say that your total pool of selected speakers has diversity. Diversity is most important where there is an implied endorsement or there is money involved. You don’t get to shine a spotlight and then ignore your responsibility.

If a conference is placing emphasis on a subset of speakers, the bar for diversity is even higher there for two reasons. First, there is where a conference has the most discretion. It is much easier to get it right with a dozen speakers than a hundred. And if they are promoting a specific set of speakers, they have much more control over who they are promoting and why.

The other reason is any time you elevate a subset of speakers, you are making a statement, even if it’s an unintentional one. Let’s say that hypothetically your conference had 25% females speakers, but not a single female precon speaker. That would send strong implied message, because precons are lucrative, highly coveted, and a strong endorsement of the speaker. These kinds of implied messages can be immensely discouraging.

For us at PASS, this applied to two areas main areas. First was the televised sessions, because these sessions had a broader reach and there was an implied endorsement of quality. It also applied to precon trainings. If folks saw the same old faces for precons, we would get roasted, and rightfully so. Lastly, it was also an issue in some of the marketing of the early invited speakers, and sometimes that presented a tension between marketing process and the selection process.

One other area where it matters are panels. In the year 2021, there is no reason to ever have an all male panel, ever, ever, ever. On a panel of 4-5 speakers, it does not take that much work to for you to find at least one female speaker. If you can’t, you should cancel the panel and look into the deeper issues with your selection process. Even if it is completely accidental, a “manel” comes across as lazy at best.

What can conference organizers do?

Generally speaking, conference organizers have a wide variety of tools to improve diversity if they are willing to get creative and especially if they have a budget to work with. PASS was notoriously stingy, which forced us to depend on the former.

Ideally, it should start during the call for speakers. If the speaker pool seems lopsided, the conference organizers should be taking steps to encourage new and diverse speakers. Any conference integrated into the community is likely going to have connections that can amplify those messages. Even better, if a conference is willing to pay speakers, that opens up a much broader pool instead of the most privileged folks. In some cases, conferences can also set aside a certain number of slots for invited speakers and use those to improve diversity.

During the selection process, a lot of it comes down to mindfulness. A conference should be spot checking things all along the way, even better if they can get folks from outside the team to help. Again, having a certain number of reserved slots and being willing to invite and pay people can go a long way here.

Finally, there are indirect steps the conference can take. First, what are you doing to encourage new or low-profile speakers? Things like feedback on abstracts or speaker mentoring can increase the breadth of speakers available to you. As a new speaker in 2017, I felt discouraged about submitting to big conferences and I’m sure under-represented folks feel the same way.

Additionally you can take steps like having a clear code of conduct, having sessions on diversity, and providing support for underrepresented groups. Signs that a conference is safe and fair to underrepresented groups will encourage more of those folks to submit.

Summary

Improving diversity in a conference is hard work, and the average call for speakers is often an unbalanced starting point. However, conference organizers have a number of tools available to them, especially for large paid conferences.

In 2021, there is never an excuse for an all male panel, and any elevated subset of speakers, such as paid precon speakers, should come under heightened scrutiny. Every tech conference should at least be aiming to match the demographics of their submitted speakers and their attendees. To me these all feel like a reasonable starting point.

What convinced me that diversity is important?

T-SQL Tuesday Logo
For this month’s T-SQL Tuesday, we are to write about a time we changed our minds on something. For me, something  I think about often is that 12 years ago I didn’t value diversity. Today I do value diversity, and if we want to persuade others, we have to figure out what changed.

Where was I 12 years ago?

Twelve years ago, I was going to college and I experienced a bit of a culture shock. The county I’ve lived my entire life in, Beaver County, is 92.5% white and 6% black. The local college I went to, Penn State Beaver, was decidedly more diverse from a racial standpoint.

I remember quite clearly thinking at that point that there had only been 3 times in my life when I had seen that contrast: college, travel stations, and visiting Washington D.C. The rest of my life was in this monoculture.

And when I say that I didn’t value diversity at the time, it wasn’t that I was against it or had negative feelings. It was just that I didn’t see a lack of diversity as a problem. I had a vague awareness that my IT classes were heavily white and heavily male, but that didn’t register as an issue.

Looking back, I think I figured that if different people had different natural talents or natural interests, then what was the harm in that? Part of the reason I was in IT was the things I was bad at. I was bad at people and interacting with them.

Bound up in the previous paragraph were a whole host of assumptions and naïve ideas. The idea that life was a meritocracy. The idea that people freely chose their profession, without discrimination.

What is your ethical structure?

On subjects like this, we often talk past each other. Sometimes in bad faith, but often because we are starting from different ethical precepts. In college I took an Ethics in Computer Science course and learned of two big terms: Utilitarian Ethics and Deontological Ethics.

Essentially, are things right or wrong because of the consequences or because of something inherent to the action? I think one example was “Is it wrong to clean your toilet with the American flag?”. Many people would say yes, but they would have trouble pointing to how anyone was visibly harmed by the act.

I mention this because I think a number of people that are supporting diversity do so from a strongly deontological perspective. So if less than 50% of CEOs are women, then that is in its very nature unjust or unfair. Regardless of the specific consequences.

I think this is a valid and reasonable viewpoint. But I also don’t think that this alone would have changed my mind. Generally I was more swayed by empathy for individuals and seeing concrete negative consequences.

What changed my mind?

I think there were a number of things that all changed my mind in little bits and pieces.

Monoculture leads to failure

Something that planted the seeds early on for me were stories of how having a monoculture of life experiences lead to failure. How HP webcams couldn’t follow black people. How Apple stores had glass staircases. How websites often don’t support names with umlauts or other markings because of assumptions about how names work.

I also learned about groupthink and the often exaggerated Asch conformity experiments. So often these failures occurred either because everyone thought the same or had the same background. Or people were afraid to speak up and stand out.

Life is not a meritocracy

For all of my schooling, I did fairly well. And there was a fairly stable correlation between how good I was at a given task and the grades I got. It seemed to me like school was a meritocracy and I expected work to be the same. I was woefully wrong.

I got a job at Bayer Material Science after college and after 8 months I was fired. I deserved to be fired and wasn’t particularly good at my job. Still, as someone used to getting A’s this was pretty shocking to me.

I learned a number of things during that process. I learned that how well we excel in certain areas can depend on a number of secondary factors, like our ability to work with others, to communicate, etc.

I learned that my skills at being a student didn’t necessarily translate into being a good worker, and I could infer the reverse must be true. There were undoubtedly people who flunked out of school but where crushing it in the workplace.

Violence, discrimination and bias are common

As far as I’m aware, I’ve never been racially or sexually discriminated against. If I have then the person did a terrible job of it, because I didn’t notice. Every police ticket I’ve gotten has been valid. I rarely have to worry about my safety when walking alone at night.

Again, in college I had some vague awareness of this. I understood the idea that my female friends might appreciate someone walking them to their car when it is dark. But I didn’t appreciate just how constant it was for some people. Heck, if I’m being honest, the #MeToo movement was shocking for me to see how many people close to me had been sexually assaulted.

It was helpful too to realize that bias can be invisible, implicit or even well-intentioned. I remember reading about implicit bias in a book and being frustrated about the implicit bias I had in regards to African Americans. I kept failing the test.

I remember meeting an internal salesperson at my last job and saying “Oh so you work in staffing?”, because I had this idea that all sales people were pushy and that women had better people skills, so clearly this person must be a staffing salesperson, not hardware or IT Services. That was 5 years ago and I still feel like an ass to this day.

Representation matters

So I’m going to talk about a type of diversity that may sound strange, but is very meaningful to me. I’m a Type 1 diabetic. And when I was diagnosed, in many ways I felt like my life was over. There were so many things I wasn’t going to be able to do. For the entirety of my last job, I told very few people because I feared I would get fired.

But one of the best things I did was subscribe to Diabetes Forecast magazine. Because in every issue they had a story of someone with diabetes doing something badass. Like racing, or mountain climbing or even winning a beauty pageant. These people weren’t letting diabetes stop them from living their lives.

For this same reason, Scott Hanselman is a hero of mine. He is very openly a diabetic and will present to hundreds of people without letting his diabetes stop him. If he has to check his blood sugar on stage or drink some orange juice, then so be it.

I say all of this to say that representation matters. My life was impacted because I saw myself in successful people. I saw myself in others.

It doesn’t take a colossal leap of empathy to imagine how I might feel if roles were reversed, if all the speakers at a tech conference were female or non-white. If that happened enough, I would quickly internalize the idea that I would never succeed, that I would never make it, that I would never get up on that stage.

If you optimize for individuals, you optimize for assholes

When I see a backlash against people trying to re-balance the scales, a lot of that backlash makes perfect sense under a certain set of assumptions or axioms.

If you believe that 1) the world is or should be a meritocracy and 2) we have an objective way to measure “skill” and 3) we should focus solely on the merits of the individual, then it’s reasonable to conclude that picking anyone but the “best” for a job is an injustice.

And while I think I believed all of those things in college, given my experiences since then, I no longer believe those things.

I’ve learned that often people fail for reasons completely outside of their control. I’ve learned that technical prowess is a minimal part of my job, and there are a plethora of other factors that make me good at my job. I’ve learned that when you narrowly look only at the individual, there are negative consequences both inside the team and outside of the team.

When you optimize for individuals, you optimize for assholes. We’ve all heard the story of person X who delivers results but is inherently toxic and drives others away. Throwing other people under the bus can be a great way to get results in the short term. We have to look beyond the individual.

I think any person in a place of power such as a manager or conference organizer, needs to look not just at the talent or popularity of the individuals, but the mix as a whole and the benefits diversity brings. We can disagree on to what degree, but it needs to be done.

Summary

I won’t pretend I have all the answers here. I still haven’t figured out when being pro-diversity swerves into being pro-tokenism, for example. I haven’t figured out that line where in trying to promote a group you accidently reduce them to a name, an identifier, a statistic.

But I can say that these are the things that have helped me value and appreciate diversity. And I hope we can all promote it, even if in small ways.

Practicing Statistics: Female DBAs and Salary

Brent Ozar recently ran a salary survey for people working with databases, and posted an article: Female DBAs Make Less Money. Why?

Many of the responses were along the lines of “Well, duh.” I, personally,  felt much of the same thing.

But, I think with something like this, there is a risk for confirmation bias. If you already believed that women were underpaid, there is a chance that you’ll see this as more proof and move on, without ever questioning the quality of the data.

What I want to do is try to take a shot at answering the question: How strong is this evidence? Does this move the conversation forward, or is it junk data?

Consider this blog post an introduction into some statistics and working with data in general. I want to walk you through some of the analysis you can do once you start learning a little bit of statistics. This post is going to talk a lot more about how we get to an answer instead of what the answer is.

Data Integrity

So, first we want to ask: Is the data any good? Does it fit the model we would expect? Barring any other information I would expect it to look similar to a normal distribution. A lot of people around the center, with roughly even tails on either side, clustered reasonably closely.

So if we just take a histogram of the raw data, what do we get?

image

So, we’ve got a bit of a problem here. The bulk of the data does look like a normal distribution, or something close to it. But we’ve got some suspicious outliers. First we have some people allegedly making over a million dollars in salary. That’s why the histogram is so wide. Hold on, let me zoom in.

image

Even ignoring the millionaires, we have a number of people reporting over half a million dollars per year in salary.

image

We’ve also got issues on the other end of the spectrum. Apparently there is someone in Canada who is working as an unpaid intern:

image

Is this really an issue?

Goofy outliers are an issue, but the larger the dataset the smaller the issue. If Bill Gates walks into a bar, the average wealth in the bar goes up by a billion. If he walks into a football stadium, everyone gets a million dollar raise.

One way of looking at the issue is to compare the median to the mean. The median is the salary smack dab in the middle, whereas mean is what we normally think of when we think of average.

The median doesn’t care where Bill Gates is, but the mean is sensitive to outliers. If we compare the two, that should give us an idea if we have too much skew in either direction.

image

So, if we take all of the raw data we get a $4,000 difference. That feels significant, but could just be the way is naturally skewed. Maybe all the entry level jobs are around the same, but the size of pay raises get bigger and bigger at the top end.

Averages after removing outliers

Okay, well lets take those outliers out. We are going to use $15,000-$165,000 as a valid range for salaries. Later on I’ll explain where I got that range.

There are 143 entries outside of that range, or about 5% of the total entries. I feel comfortable excluding that amount. So what’s the difference now?

image

So the middle hasn’t moved, but the mean is about the same now. So this tells me that salaries are evenly distributed for the most part, with some really big entries towards the high end. Still, the $4,000 shift isn’t too big, right?

Wait, this actually is an issue…

Remember when I said 2 seconds ago that $4,000 was a significant but not crazy large? Well, unfortunately the skew in the data set really screws up some analysis we want to do. Specifically, our friend Standard Deviation. We need a reasonable standard deviation to do a standard error analysis, which we cover later.

Standard Deviation is a measure of the spread of a distribution. Are the numbers clumped near the mean, or are they spread far out? The larger the standard deviation, the more variation of entries.

If a distribution is a roughly normal distribution, we can predict how many results will fall within a certain range: 68% within +/- 1 standard deviation, 95% within +/- 2 deviations, 99.7% within 3 deviations.

Well, because of the way standard deviation is calculated, it is especially sensitive to outliers. In this case, it’s extremely sensitive. The standard deviation of all the raw data is $66,500 . When I remove results outside of $15-$165K, the standard deviation plummets to $32,000. This suggests that there is a lot of variability in the data being caused by 5% of the entries.

So let’s talk about how to remove outliers.

Removing Outliers

Identifying the IQR

Remember when I got that $15,000-$165000 range? That’s by using a tool called InterQuartile Range or IQR.

It sounds fancy, but it is incredibly simple. Interquartile range is basically the distance between the middle of the bottom half and the middle of the top half.

So in our case, if we take the bottom half of the data, the median salary is $65,000. If we take the top half of the data, the median salary is $115,000. The IQR is the difference between the two numbers, which is $50,000.

Using IQR to filter out outliers

Okay, so we have a spread $50,000 between the first and third quartiles. How do we use that information? Well there is a common rule of thumb that anything outside +/- 1.5 IQR is an outlier. In fact, when you see a boxplot, that is what is going on when you see those dots.

So, $50,000 *1.5 is $75,000. If we take the median ($90,000) and add/subtract 75,000 we get our earlier range of $15,000-$165,000

Standard Error Analysis

Okay, so why did we go through all that work to get the standard deviation to be a little more reasonable? Well, I want to do something called a Standard Error Analysis to answer the following question:

What if our sample is a poor sample?

What is our average female salary is lower because of a sampling error? Specifically, what are the odds that we samples a lower average salary by pure chance? “Poppycock!”, you might say. Well, standard error gives us an idea of how unlikely that is.

Importance of sample size

Let’s say there are only 1,000 female DBAs in the whole world, and we select 10 of them. What are the chances that the average salary of those 10 is representative of the original 1,000? It’s not great. We could easily pick 10 individuals from the bottom quartile, for example.

What if we sampled 100 instead of 1000? The chances get a lot better. We are far more likely to include individuals that are above average for the population as a whole. The larger the sample, the closer the sample mean will match the mean of the original population.

The larger the sample size, the smaller the standard error.

Importance of spread

Remember before we said that a reasonable standard deviation is important?  Let’s talk about why. Let’s say there are 10 people in that bar, and that the spread of salaries is small. Everyone there make roughly the same amount. As a result the standard deviation, a measure of spread, is going to be quite small.

So, let’s say you take three people at random out of that bar, bribe them with a free drink, and take the average of their salaries. If you do that multiple times, in general that number is going to be close to the true average of the whole population (the bar).

Now, Bill Gates walks in again and we repeat the exercise three times. Because he is such an outlier, the standard deviation is much larger. This throws everything out of whack. We get three samples: $50,000, $60,000, and $30,000,000,000. Whoops.

The smaller the standard deviation of a population, the smaller the standard error.

Calculating Standard Error

Getting the prerequisites

To calculate the standard error, we need mean, standard deviation and sample size. Before we calculate those numbers, I want to narrow the focus a bit.

I’ve taken the source data, and narrowed it down to US, DBA and within $15,000-$16,5000. This should give us more of an apples to apples comparison. So what do we get?

image

We’ve got a gap in average salary of about $4,500.  This seems quite significant, but soon we’ll prove how significant.

We’ve also a standard deviation of around $24,500. If salaries full under a normal distribution, this means that 95% of DBA salaries in the US should be within $52,500-$151,000. That sounds about right to me.

Calculating individual standard error

So now we have everything we need to calculate standard error for the female and male samples individually. The formula for standard error is standard deviation divided by the square root of the count.

So for females, it’s 23,493 / sqrt(123) = 2118. This means that if we were only sampling female DBAs, we would expect the average salary to be within +/- $2,118  about two thirds of the time.

image

So, if we were to randomly sample female DBA’s, then 95% of the time, that sample’s average would be less than the male average from before.

image

That seems like a strong indicator that the lower average salary for females isn’t just chance. But we actually have a stronger way to do this comparison.

Standard error of sample means

Whenever you want to compare the means from two different samples, you use a slightly different formula which combines everything together.

The formula is SQRT( (Sa^2 + Sb^2) / (na +nb)) . S is the standard deviation for samples a and b. Standard deviation squared is also known as the variance. N is the count for samples a and b.

If we combine it all together we get this:

image

The standard error when we combine samples is $1,154. This indicates that if there was no difference between the distribution in female and male salaries, then 68% of the time they would be within $1,154.

Well in this case, the difference in means is almost 4 times that. So if the difference in means is 3.88 standard deviations apart, how often would that happen by pure chance? Well, we would see this level of separation 0.01% of the time, or about 1 in 10,000.

Conclusion

I take this as strong evidence that there is a real wage gap between female and male DBAs in the USA.

What this does not tells us is why. There are a number of reasons that people speculate as to the cause of this gap, many in Brent’s original blog post and the comments below it. I’ll leave that to them to speculate what the cause is.