Everytime I see a stat like “25% of respondents are Black”, I see only one piece of a four-piece puzzle filled in. With only this piece, I don’t know how to use this information. I don’t know if it matches the category from my dataset, I don’t know if it reflects how I want to engage with the idea of “Black”, and most importantly I don’t know who is drawing these identity boxes and why.

When it comes to social identity data, these are the blanks that always need to be filled in:

Who defines the categories?
What categories are there?

When we want to address equity in data science, we often need to talk about power and we sometimes need to talk about money. It can be useful to think about an individual’s data like a raw resource, you could call it something cheesy like “Dataonium”. There’s a reason that our icon for Step #4: Data Collection & Sourcing in the Data Equity Framework has a pickaxe in it. The fact that data science uses words like “mining”, “raw”, “refining”, “cleaning”, “pipeline”, “ETL (extract, transform, load)”, and more is no coincidence.

Early in my career, I was working on a project using education data, and we were having a meeting with policymakers, school principals, and a team of researchers. When one of the principals asked a question about what assumptions were used in crafting one of the models, I was perplexed when the lead researcher launched into a long speech chock full of vague, unusual and complex jargon. Why would he do this? I knew that it wasn’t because the researcher was unable to discuss these ideas in simple, clear language because we had had many easy, plain conversations together about…

I want to talk to you about why you should stop saying “not statistically significant” based on sample size alone.

Learning #1: Don’t Call it “Indigenous Quantitative Methods”

A few months ago I set out to create a research brief of Indigenous Quantitative Methodologies. I wanted to start a survey of just some of the existing ways that Indigenous cultures around the world create knowledge and solve problems with data.

I had a few eye-opening experiences working with Indigenous researchers and felt that many of the techniques, systems, and approaches that were being used would have a revolutionary equity impacts on some of the “western” (for want of a much better term) or “what-you-get-taught-at-university” approaches that I am an expert in.

As soon as I started talking to…

Does Paying or Compensating Survey Respondents Negatively Affect Response Quality or Reliability?

At We All Count, we think a lot about how to increase the equity of the data gathering process. We make a living off of the data science ecosystem and so do many of our project members and the people who read these posts. We all know that data is valuable, bringing us to an interesting question: should we be paying for it when we collect it?

The opinions about this vary widely between sectors and industries that use data. We’ve worked with a social-sector program evaluation firm that would immediately discount any survey data where the respondents were paid…

Talking about data equity can be tricky. Maybe you’ve been to a conference or a workshop where you encountered an idea, a tool or a process that you’re super excited about. You want to bring it up with your team on Monday but by then you’re a little hazy on the details or you can’t concisely describe the ideas that you were so taken with. You also know that in order to actually implement anything you’re excited about, you’re going to have to convince the people in charge and that can be tricky.

Here are a few pointers and a…

Too often in data science, we use identity categories.

We once were hired by clients involved in a youth mental health situation where they needed to target scarce resources (why the resources were scarce is an entirely other conversation for a different time….) at providing support to young people in our community who were at risk for mental health issues. The client organization had research that showed that one of the primary drivers of the mental health issues among the youth in the community was bullying. So they wanted to make resources available to those most likely to be experiencing bullying.

How were we going to know who was…

You’ve heard us say it before: Define your question first, then choose a methodology. Rather than letting your methodology limit your questions, let your research questions drive the project design. For an example of how to reframe research questions and adjust methodologies, check out this post.

The We All Count Methodology Matrix is an extremely simple resource that you can use to identify methodologies appropriate for the kind of questions you want to answer with data. If you want to play around with it now and don’t need any more convincing, check it out here.

Don’t Get Bullied

Methodology selection is one of…

“I’m wondering if you have strategies for embedding equity in the “data collection” category for projects that get data second-hand. Over 90% of our projects use data that has already been collected (usually government or non-profit) and historically we have had no input on the construction of categories.”

Here’s a situation that happens all the time: A researcher is handed a pile of data that someone else collected and is asked to answer important questions about the people in that data. …

Heather Krause

Data scientist & statistician (one of only 150 accredited PStats worldwide). Providing data science services grounded in an equity lens. https://weallcount.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store