Thinking Through the Strava Data

Lots of people are talking about the announcement that the Oregon Department of Transportation (ODOT) is using a Strava dataset to conduct a research study (click here for the article). As with everything in this world, there is a range of people who have different views on this project.

I am a researcher. Spending an afternoon analyzing data sounds fun to me. I read about methodology for kicks. I love long conversations (okay…debates) about epistemology and philosophy.


My weekend plans? oh, you know, just catching up on some feminist methodology before I dig into one of my Stephen King books.


The first question that should be ever be asked about a research project is Why. Why do it, what’s the point? Unfortunately, too many researchers get confused by this simple question. I’ve had a of experiences that go something like this:


Usually, it’s because they’re doing what’s called “Basic Research” — studying something just to learn something. 

Personally, I am not even interested in doing a research project unless it will address social inequity.

With such limited resources to go around, it seems both wasteful and harmful for any social science project to be about anything besides addressing inequity. So, that’s an important question to ask about this study. 

The main justification for the study is this:

The problem for many transportation agencies today is that, while bicycling is on the rise (for both transportation and recreation), there remains a major lack of data. This gap in data makes it much harder to justify bicycle investments, plan for future bicycle traffic growth, illustrate the benefits of bike infrastructure investments, and so on. It also makes non-auto use of roads very easy for agencies to overlook. And while ODOT and many cities do bike counts already, they only measure one location for a short period of time. Most importantly, current bicycle count methods don’t provide any context about how people actually ride. It’s this element of “bicycle travel behavior” that ODOT is most excited about.  (emphasis added)

It sounds like the purpose of this study is to make data-informed decisions that will increase cycling and improve the infrastructure for transportation and recreational cycling.  Cool. 

Who is Represented? 

Most of the buzz is about how Strava users don’t represent non-Strava users well. This makes sense, because we all use the road in different ways and choose different routes for varying reasons. 


Not to mention that this can even change day-to-day from the same person

Nonrepresentative samples are nothing new. While ideally research projects use a representative sample, you might be surprised to know that most of it is not. But that doesn’t mean the study is necessarily worthless. It just means that it’s critically important to be transparent about this, justify decisions and choices about sampling, and use the results responsibly

Unfortunately, it is too common for researchers to state the sample limitation and then move on with the data as though the limitation didn’t really exist. Worse, one of the main justifications is: the sample was convenient. The data were there or were easy to get. That’s what this Strava dataset is–it’s about convenience. Not good enough.

That’s like buying customer data from Whole Foods and using it to understand the grocery shopping behaviors of everyone in the city, all because the data were there. 

Researchers can use a nonrepresentative convenience sample and still get useful data that can address inequity. Sometimes, what people do is they learn something about the people who aren’t represented in that study. For example, are the people being excluded more likely to have lower incomes, be people of color, and women? 

On the one hand, now you know and can understand the results within that limited, specific context. On the other hand…maybe that means you should question your methodology

But I digress. 

So it looks like they kind of tried to do that–find out how well their sample is representative of commuters:

The Strava Workgroup has done some analysis of trips using Portland’s Hawthorne Bridge bike counter. When they compared those numbers with Strava data of the same day and time, they found 2.5% of the trips were made by Strava users. Given that the Hawthorne Bridge is primarily a route for bicycle commuters, Bradway feels it offers a conservative sample size. “In other areas, like Skyline or Rock Creek Road [both of which are popular training routes], it would be much higher.”

Strava only represented 2.5% of all the commuters!? That’s a glaring red flag that Strava data could have some major flaws when trying to apply it to people who commute by bike. I really hope that the investigators are doing additional work on this. 

After all, the stated purpose of this study was to inform policy and infrastructure. So, no, the limitation is likely not that it’s a “small sample size.”  The limitation is that it’s probably an inappropriate sample to address the project goals. 


How Research is Used 

But even with all that said, it didn’t have to be so bad. Every project has to start somewhere, and pilots have many limitations. $20,000.00 is nothing for a sample size that large (Calculate paying each participant $25.00 for their participation), and this could be a great pilot to test how to go about studying cyclists’ behavior using GPS–both in terms of its strengths and limitations as an approach. 

But, the problem is that this (most likely) systematically flawed sample is already being applied to real-life important changes:

The third, and most interesting task for ODOT’s Strava Workgoup is to explore pilot projects where the data can inform policy and project decisions. And Bradway says, that work has already begun.

So far, based on the Strava data they have changed where they do in-person bicycle counts and where to install rumble strips on the highway. 

This is a problem.

First, the counter locations have moved so that that they can track more Strava cyclists. In this world, you only matter if you’re “counted” by the policy makers, and now they’re going to be even more likely to count Strava users and maybe not commuters, then how will this improve infrastructure for everyday commuters 

Second, they are adding safety features to the roads where Strava users go. Are all or most of the infrastructure improvements going to privilege Strava users over everyday commuters? 

Inequity in, and as a result of, research 

Research and science have a horrible reputation for favoring the privileged, and this is still a major problem today.   It’s usually the privileged groups that get to do the research, that get heard in the research, and benefit from the research. So far, my discomfort with this project (as currently described) is that it doesn’t seem like it will do much to change this historical problem. 

If the researchers aren’t careful and aren’t paying special attention to equity, then they might just end up using all that “big data” to make improvements for the a small subgroup of cyclists, rather than the cyclist community as a whole. 


And For Next Time?

This is why I am a fan of community-based participatory research. I think it’s better to work with communities and have them lead research projects from the start, rather than the other way around. Otherwise, people do projects that they find geeky-cool, but don’t address any real concerns that most people have (at best) or just reinforcing the inequity problems that have without even realizing it. 

{{{ hopefully the researchers prove me wrong, eh? }}}

psst…hey…have you checked out this t-shirt I’m selling? Don’t miss your chance to reserve it before the sale ends — once it ends, the shirt is gone forever!


Click the image or here to get a tee:!


About these ads